LP #1690399: InnoDB performance drop in 5.7 because of the lru_manager

Description

**Reported in Launchpad by jocelyn fournier last update 29-07-2017 04:17:31

Hi,

I'm investigating a performance regression in InnoDB between 5.6 & 5.7.
I noticed a lot of time is spent in os_thread_sleep called from buf_lru_manager_sleep_if_needed().
Is there any way to avoid this (my configuration is using innodb_buffer_pool_instances=24, so I assume it creates 24 lru_manager as well ?)

Poor's man profiler result :
89 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,cond=0x1f74bc0),mutex=<optimized,out>,,at,handle_connection,pfs_spawn_thread,start_thread,clone,??
24 nanosleep,os_thread_sleep,buf_lru_manager_sleep_if_needed,out>),start_thread,clone,??
23 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=<optimized,srv_worker_thread,start_thread,clone,??
23 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,pop_jobs_item,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
1 test_quick_select,mysql_update,Sql_cmd_update::try_single_table_update,Sql_cmd_update::execute,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
1 — ,sigwaitinfo,timer_notify_thread_func,start_thread,clone,
[...]

Thanks!
Jocelyn

Environment

None

Smart Checklist

Activity

Show:

lpjirasync January 24, 2018 at 11:22 AM

**Comment from Launchpad by: Launchpad Janitor on: 29-07-2017 04:17:29

[Expired for Percona Server because there has been no activity for 60 days.]

lpjirasync January 24, 2018 at 11:22 AM

**Comment from Launchpad by: Laurynas Biveinis on: 29-05-2017 16:47:45

I see. Back to original issue, the mapping between LRU managers and buffer pool instances is 1:1 by design, and the only way to reduce the LRU manager number is to reduce the buffer pool instance count.

Then, mistuned LRU flushing could manifest in different ways, the more serious one is the lack of free pages. That is seen in PMP as stacktraces involving buf_LRU_get_free_block, and your PMP does not show it. Another would be too-aggressive LRU flushing, but does not seem likely either here (it is capped by innodb_lru_scan_depth for each ~11GB bp instance, which does not seem excessively high).

Thus, I don't see immediate evidence that your performance drop is directly related to LRU flushing yet. Perhaps you can provide further details about the drop itself?

lpjirasync January 24, 2018 at 11:22 AM

**Comment from Launchpad by: jocelyn fournier on: 29-05-2017 09:47:53

Hi Laurynas!

Unfortunately, 5.7.18 doesn't really change much in my case.

lpjirasync January 24, 2018 at 11:22 AM

**Comment from Launchpad by: Laurynas Biveinis on: 25-05-2017 07:46:54

Thanks for your bug report. I am not sure that the sleep itself here is a problem - the key for LRU threads is to flush the right amount of pages at the right time - whether the "right time" is reached by sleep or by event wait should be secondary to the choice of heuristics. But perhaps an event wait would allow to implement better heuristics than sleep.

How does 5.7.18 testing look?

lpjirasync January 24, 2018 at 11:22 AM

**Comment from Launchpad by: jocelyn fournier on: 17-05-2017 08:32:41

A few useful variables used in my case :

innodb_log_file_size=32G
innodb_empty_free_list_algorithm=backof
innodb_buffer_pool_size=270G
innodb_buffer_pool_instances=24
innodb_lru_scan_depth=1024

Percona version was 5.7.17, I'm currently testing 5.7.18 with the improved LRU manager.

Done

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 11:21 AM
Updated January 24, 2018 at 11:22 AM
Resolved January 24, 2018 at 11:22 AM