Done
Details
Details
Assignee
Unassigned
UnassignedReporter
Laurynas Biveinis
Laurynas Biveinis(Deactivated)Time tracking
4d 29m logged
Priority
Smart Checklist
Smart Checklist
Created July 19, 2019 at 2:25 PM
Updated March 6, 2024 at 11:59 AM
Resolved November 1, 2019 at 9:33 AM
In 8.0, some code paths may result in concurrent threads accessing the same parallel doublewrite shard. Since the shards have no locking due to no-sharing requirement, this results in hard-to-track flushing failures, but luckily not during regular server operation.
The main culprit is InnoDB initialization before the cleaner thread start, e.g. trx_rseg_adjust_rollback_segments called from srv_start. This may result in preflushing attempt by this thread, e.g.
Thread 2 (Thread 0x7f23be7a3700 (LWP 47964)): #0 0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655 #3 0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8) at ../../../percona-server/storage/innobase/include/os0event.h:172 #4 0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:135 #5 0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:320 #6 0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028 #7 0x000056304de528f9 in buf_flush_sync_all_buf_pools () at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305 #8 0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=165991) at ../../../percona-server/storage/innobase/log/log0chkp.cc:717 #9 0x000056304dce6853 in log_request_checkpoint (log=..., sync=sync@entry=true, lsn=165991) at ../../../percona-server/storage/innobase/log/log0chkp.cc:644 #10 0x000056304dce2d81 in <lambda(bool)>::operator()(bool) const (__closure=__closure@entry=0x7f23be79b4c0) at ../../../percona-server/storage/innobase/log/log0buf.cc:647 #11 0x000056304dce2eaf in ut_wait_for<log_free_check_wait(log_t&, sn_t)::<lambda(bool)> >(<lambda(bool)>, uint64_t, uint64_t) (condition=..., sleep=<optimized out>, spins_limit=0) at ../../../percona-server/storage/innobase/include/ut0ut.ic:141 #12 0x000056304dce2f0f in log_free_check_wait (log=..., sn=<optimized out>) at ../../../percona-server/storage/innobase/log/log0buf.cc:653 #13 0x000056304dcbfdc7 in log_free_check () at ../../../percona-server/storage/innobase/include/log0log.ic:228 #14 0x000056304ddd9d26 in trx_rseg_create(unsigned int, unsigned long) () at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:373 #15 0x000056304dddb0ce in trx_rseg_add_rollback_segments(unsigned int, unsigned long, Rsegs*, unsigned long*) () at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:518 #16 0x000056304dddb884 in trx_rseg_adjust_rollback_segments (target_rollback_segments=128) at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:656 #17 0x000056304ddb46b9 in srv_start(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at ../../../percona-server/storage/innobase/srv/srv0start.cc:2883 #18 0x000056304dc85853 in innobase_init_files(dict_init_mode_t, List<Plugin_tablespace const>*) () at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:5554 #19 0x000056304dc85f82 in innobase_ddse_dict_init(dict_init_mode_t, unsigned int, List<dd::Object_table const>*, List<Plugin_tablespace const>*) () at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:12783 #20 0x000056304d9ff02c in dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int) () at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:746 #21 0x000056304da010fe in dd::bootstrap::initialize(THD*) () at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:903 #22 0x000056304d0f5a45 in bootstrap::handle_bootstrap (arg=arg@entry=0x563051161560) at ../../percona-server/sql/bootstrap.cc:347 #23 0x000056304dbbaf1f in pfs_spawn_thread (arg=0x5630510f1240) at ../../../percona-server/storage/perfschema/pfs.cc:2836 #24 0x00007f23c358a6db in start_thread (arg=0x7f23be7a3700) at pthread_create.c:463 #25 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
while the low log space situation would be at the same time noticed by a log checkpointer thread, and it would try do the same, i.e.
Thread 15 (Thread 0x7f23aa7f6700 (LWP 48240)): #0 0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655 #3 0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8) at ../../../percona-server/storage/innobase/include/os0event.h:172 #4 0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:135 #5 0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:320 #6 0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028 #7 0x000056304de528f9 in buf_flush_sync_all_buf_pools () at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305 #8 0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=177545) at ../../../percona-server/storage/innobase/log/log0chkp.cc:717 #9 0x000056304dce6d72 in log_consider_sync_flush (log=...) at ../../../percona-server/storage/innobase/log/log0chkp.cc:764 #10 0x000056304dce7972 in log_checkpointer(log_t*)::{lambda()#1}::operator()() const [clone .isra.34] () at ../../../percona-server/storage/innobase/log/log0chkp.cc:912 #11 0x000056304dce7ab4 in log_checkpointer (log_ptr=0x7f23b86a8fc0) at ../../../percona-server/storage/innobase/log/log0chkp.cc:928 #12 0x000056304dcf1427 in Runnable::operator()<void (*)(log_t*), log_t*> (this=0x7f23b87792a8, f=<optimized out>, args#0=<optimized out>) at ../../../percona-server/storage/innobase/include/os0thread-create.h:86 #13 0x00007f23c143c66f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007f23c358a6db in start_thread (arg=0x7f23aa7f6700) at pthread_create.c:463 #15 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Other callers - such as buf_flush_list_now_set in debug builds - are suspect too.
This is visible on MTR as an intermittent hang on innodb_undo.i_s_files_32k, innodb.create_tablespace_64k, innodb.create_tablespace_32k as analyzed by @Slava Sarzhan and @Yura Sorokin
This appears to be a 8.0 regression caused by interplay of upstream redo logging rewrite and our parallel doublewrite, but check 5.7 too.