Concurrent access attempts to non-shareable parallel doublewrite buffer shards

Description

In 8.0, some code paths may result in concurrent threads accessing the same parallel doublewrite shard. Since the shards have no locking due to no-sharing requirement, this results in hard-to-track flushing failures, but luckily not during regular server operation.

The main culprit is InnoDB initialization before the cleaner thread start, e.g. trx_rseg_adjust_rollback_segments called from srv_start. This may result in preflushing attempt by this thread, e.g.

Thread 2 (Thread 0x7f23be7a3700 (LWP 47964)): #0 0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655 #3 0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8) at ../../../percona-server/storage/innobase/include/os0event.h:172 #4 0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:135 #5 0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:320 #6 0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028 #7 0x000056304de528f9 in buf_flush_sync_all_buf_pools () at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305 #8 0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=165991) at ../../../percona-server/storage/innobase/log/log0chkp.cc:717 #9 0x000056304dce6853 in log_request_checkpoint (log=..., sync=sync@entry=true, lsn=165991) at ../../../percona-server/storage/innobase/log/log0chkp.cc:644 #10 0x000056304dce2d81 in <lambda(bool)>::operator()(bool) const (__closure=__closure@entry=0x7f23be79b4c0) at ../../../percona-server/storage/innobase/log/log0buf.cc:647 #11 0x000056304dce2eaf in ut_wait_for<log_free_check_wait(log_t&, sn_t)::<lambda(bool)> >(<lambda(bool)>, uint64_t, uint64_t) (condition=..., sleep=<optimized out>, spins_limit=0) at ../../../percona-server/storage/innobase/include/ut0ut.ic:141 #12 0x000056304dce2f0f in log_free_check_wait (log=..., sn=<optimized out>) at ../../../percona-server/storage/innobase/log/log0buf.cc:653 #13 0x000056304dcbfdc7 in log_free_check () at ../../../percona-server/storage/innobase/include/log0log.ic:228 #14 0x000056304ddd9d26 in trx_rseg_create(unsigned int, unsigned long) () at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:373 #15 0x000056304dddb0ce in trx_rseg_add_rollback_segments(unsigned int, unsigned long, Rsegs*, unsigned long*) () at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:518 #16 0x000056304dddb884 in trx_rseg_adjust_rollback_segments (target_rollback_segments=128) at ../../../percona-server/storage/innobase/trx/trx0rseg.cc:656 #17 0x000056304ddb46b9 in srv_start(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at ../../../percona-server/storage/innobase/srv/srv0start.cc:2883 #18 0x000056304dc85853 in innobase_init_files(dict_init_mode_t, List<Plugin_tablespace const>*) () at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:5554 #19 0x000056304dc85f82 in innobase_ddse_dict_init(dict_init_mode_t, unsigned int, List<dd::Object_table const>*, List<Plugin_tablespace const>*) () at ../../../percona-server/storage/innobase/handler/ha_innodb.cc:12783 #20 0x000056304d9ff02c in dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int) () at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:746 #21 0x000056304da010fe in dd::bootstrap::initialize(THD*) () at ../../percona-server/sql/dd/impl/bootstrap/bootstrapper.cc:903 #22 0x000056304d0f5a45 in bootstrap::handle_bootstrap (arg=arg@entry=0x563051161560) at ../../percona-server/sql/bootstrap.cc:347 #23 0x000056304dbbaf1f in pfs_spawn_thread (arg=0x5630510f1240) at ../../../percona-server/storage/perfschema/pfs.cc:2836 #24 0x00007f23c358a6db in start_thread (arg=0x7f23be7a3700) at pthread_create.c:463 #25 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

while the low log space situation would be at the same time noticed by a log checkpointer thread, and it would try do the same, i.e.

Thread 15 (Thread 0x7f23aa7f6700 (LWP 48240)): #0 0x00007f23c35909f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f23b0019724) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f23b00196d0, cond=0x7f23b00196f8) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7f23b00196f8, mutex=0x7f23b00196d0) at pthread_cond_wait.c:655 #3 0x000056304dd208f1 in os_event::wait (this=this@entry=0x7f23b00196c8) at ../../../percona-server/storage/innobase/include/os0event.h:172 #4 0x000056304dd20a6a in os_event::wait_low (this=0x7f23b00196c8, reset_sig_count=4, reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:135 #5 0x000056304dd20af5 in os_event_wait_low (event=<optimized out>, reset_sig_count=reset_sig_count@entry=0) at ../../../percona-server/storage/innobase/os/os0event.cc:320 #6 0x000056304de4d18e in buf_flush_wait_batch_end (buf_pool=buf_pool@entry=0x0, type=type@entry=BUF_FLUSH_LIST) at ../../../percona-server/storage/innobase/buf/buf0flu.cc:2028 #7 0x000056304de528f9 in buf_flush_sync_all_buf_pools () at ../../../percona-server/storage/innobase/buf/buf0flu.cc:3305 #8 0x000056304dce67ea in log_preflush_pool_modified_pages (log=..., new_oldest=new_oldest@entry=177545) at ../../../percona-server/storage/innobase/log/log0chkp.cc:717 #9 0x000056304dce6d72 in log_consider_sync_flush (log=...) at ../../../percona-server/storage/innobase/log/log0chkp.cc:764 #10 0x000056304dce7972 in log_checkpointer(log_t*)::{lambda()#1}::operator()() const [clone .isra.34] () at ../../../percona-server/storage/innobase/log/log0chkp.cc:912 #11 0x000056304dce7ab4 in log_checkpointer (log_ptr=0x7f23b86a8fc0) at ../../../percona-server/storage/innobase/log/log0chkp.cc:928 #12 0x000056304dcf1427 in Runnable::operator()<void (*)(log_t*), log_t*> (this=0x7f23b87792a8, f=<optimized out>, args#0=<optimized out>) at ../../../percona-server/storage/innobase/include/os0thread-create.h:86 #13 0x00007f23c143c66f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007f23c358a6db in start_thread (arg=0x7f23aa7f6700) at pthread_create.c:463 #15 0x00007f23c0af988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Other callers - such as buf_flush_list_now_set in debug builds - are suspect too.

This is visible on MTR as an intermittent hang on innodb_undo.i_s_files_32k, innodb.create_tablespace_64k, innodb.create_tablespace_32k as analyzed by and

This appears to be a 8.0 regression caused by interplay of upstream redo logging rewrite and our parallel doublewrite, but check 5.7 too.

Environment

None

Attachments

1
  • 18 Oct 2019, 02:50 PM

Smart Checklist

Activity

Show:

George Lorch August 11, 2020 at 12:42 PM

Invalid now with Oracle parallel doublewrite implementation taken in 8.0.20

Satya Bodapati August 11, 2020 at 8:32 AM

This bug can be closed now. Upstream solved it by having internal mutex on the buffer that holds dblwr pages.

George Lorch August 10, 2020 at 1:54 PM

 and , is this still a thing now that we have taken in Oracle parallel dblwr ?

Slava Sarzhan October 18, 2019 at 2:51 PM

please see the stacktrace.

Done

Details

Assignee

Reporter

Time tracking

4d 29m logged

Affects versions

Priority

Smart Checklist

Created July 19, 2019 at 2:25 PM
Updated March 6, 2024 at 11:59 AM
Resolved November 1, 2019 at 9:33 AM