handle_fatal_signal (sig=6) in galera::ReplicatorSMM::async_recv
Description
Environment
Smart Checklist
Activity

KennT April 3, 2017 at 1:19 PM
Reproed with 5.7 codership. Logged the bug with codership:

Krunal Bauskar April 3, 2017 at 5:29 AMEdited
Could reproduce it upstream too.
Steps
1. Start single node and wait galera::ReplicatorSMM::async_recv while processing of the function. Let the state check happen.
Wait before this stmt
while (gu_unlikely((rc = as_->process(recv_ctx, exit_loop))
== -ECANCELED))
2. Parallely unload the provider. This passes successfully.
3. When the stmt is resumed as provider is unloaded vtable is invalid and we hit the said assert.
------------------------
Probability of hitting the assert is pretty-low or rare given that user need to unload provider while the server is booting up. This is not-valid-scenario per-say though the use-case is valid.

KennT April 1, 2017 at 1:17 AM
SET GLOBAL wsrep_provider=none;
is being called on the node (in another thread), so in that thread we see the ReplicatorSMM::~ReplicatorSMM() being called, while the other threads are still using the object.
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f77d2b44049 in fifo_flush (q=0x7f77da3b7000) at galerautils/src/gu_fifo.c:208
#2 0x00007f77d2b44c37 in gu_fifo_destroy (queue=0x7f77da3b7000) at galerautils/src/gu_fifo.c:490
#3 0x00007f77d2c887f7 in gcs_destroy (conn=0x7f77db01f000) at gcs/src/gcs.cpp:1529
#4 0x00007f77d2cea369 in galera::Gcs::~Gcs (this=0x7f77d47c4100, __in_chrg=<optimized out>) at galera/src/galera_gcs.hpp:93
#5 0x00007f77d2ce051f in galera::ReplicatorSMM::~ReplicatorSMM (this=0x7f77d47c3c00, __in_chrg=<optimized out>) at galera/src/replicator_smm.cpp:307
#6 0x00007f77d2ce08f8 in galera::ReplicatorSMM::~ReplicatorSMM (this=0x7f77d47c3c00, __in_chrg=<optimized out>) at galera/src/replicator_smm.cpp:326
#7 0x00007f77d2cfef25 in galera_tear_down (gh=0x7f77d63f7480) at galera/src/wsrep_provider.cpp:103
#8 0x0000000001daeb45 in wsrep_unload (hptr=0x7f77d63f7480) at /ssd/ramesh/perf/pxc-perf/wsrep/wsrep_loader.c:221
#9 0x0000000000e9ee45 in wsrep_deinit () at /ssd/ramesh/perf/pxc-perf/sql/wsrep_mysqld.cc:1206
#10 0x0000000000eaae5b in wsrep_provider_update (self=0x2d056c0 <Sys_wsrep_provider>, thd=0x7f77a5c19000, type=OPT_GLOBAL) at /ssd/ramesh/perf/pxc-perf/sql/wsrep_var.cc:363
#11 0x000000000147dcd9 in sys_var::update (this=0x2d056c0 <Sys_wsrep_provider>, thd=0x7f77a5c19000, var=0x7f77a5c2bcb8) at /ssd/ramesh/perf/pxc-perf/sql/set_var.cc:184
#12 0x000000000147f3a6 in set_var::update (this=0x7f77a5c2bcb8, thd=0x7f77a5c19000) at /ssd/ramesh/perf/pxc-perf/sql/set_var.cc:816
#13 0x000000000147ec69 in sql_set_variables (thd=0x7f77a5c19000, var_list=0x7f77a5c1ba68, free_joins=true) at /ssd/ramesh/perf/pxc-perf/sql/set_var.cc:672
#14 0x000000000154e0ee in mysql_execute_command (thd=0x7f77a5c19000, first_level=true) at /ssd/ramesh/perf/pxc-perf/sql/sql_parse.cc:4537
#15 0x0000000001555bd6 in mysql_parse (thd=0x7f77a5c19000, parser_state=0x7f77df572f70) at /ssd/ramesh/perf/pxc-perf/sql/sql_parse.cc:6896
#16 0x00000000015580d1 in wsrep_mysql_parse (thd=0x7f77a5c19000, rawbuf=0x7f77a5c2b030 "SET GLOBAL wsrep_provider=none", length=30, parser_state=0x7f77df572f70) at /ssd/ramesh/perf/pxc-perf/sql/sql_parse.cc:7877
#17 0x0000000001546d28 in dispatch_command (thd=0x7f77a5c19000, com_data=0x7f77df573850, command=COM_QUERY) at /ssd/ramesh/perf/pxc-perf/sql/sql_parse.cc:1839
#18 0x0000000001545030 in do_command (thd=0x7f77a5c19000) at /ssd/ramesh/perf/pxc-perf/sql/sql_parse.cc:1176
#19 0x000000000168ba99 in handle_connection (arg=0x7f77b2242ee0) at /ssd/ramesh/perf/pxc-perf/sql/conn_handler/connection_handler_per_thread.cc:312
#20 0x00000000018b3b49 in pfs_spawn_thread (arg=0x7f77d57c3520) at /ssd/ramesh/perf/pxc-perf/storage/perfschema/pfs.cc:2188
#21 0x00007f77df03adf5 in start_thread (arg=0x7f77df574700) at pthread_create.c:308
#22 0x00007f77dd2881ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

KennT April 1, 2017 at 1:09 AM
Interesting, looking at the vtable, it looks like the object hasn't been constructed yet. Also, on the other threads, galera is being shutdown, i.e. wsrep_deinit() is being called.
+info vtbl as_
vtable for 'galera::GcsActionSource' @ 0x7f77d2fe4490 (subobject @ 0x7f77d47c41d8):
[0]: 0x7f77d2cc3b00 <galera::ActionSource::~ActionSource()>
[1]: 0x7f77d2cc3b36 <galera::ActionSource::~ActionSource()>
[2]: 0xe715d0 <__cxa_pure_virtual@plt>
Details
Details
Assignee

Reporter

Labels
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

GDB info
#0 0x00007f77df03f771 in pthread_kill () from /lib64/libpthread.so.0 #1 0x000000000188c2b6 in my_write_core (sig=6) at /ssd/ramesh/perf/pxc-perf/mysys/stacktrace.c:249 #2 0x0000000000e8ddfc in handle_fatal_signal (sig=6) at /ssd/ramesh/perf/pxc-perf/sql/signal_handler.cc:235 #3 <signal handler called> #4 0x00007f77dd1c75d7 in raise () from /lib64/libc.so.6 #5 0x00007f77dd1c8cc8 in abort () from /lib64/libc.so.6 #6 0x00007f77ddacb9d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #7 0x00007f77ddac9946 in ?? () from /lib64/libstdc++.so.6 #8 0x00007f77ddac9973 in std::terminate() () from /lib64/libstdc++.so.6 #9 0x00007f77ddaca4df in __cxa_pure_virtual () from /lib64/libstdc++.so.6 #10 0x00007f77d2ce0f93 in galera::ReplicatorSMM::async_recv (this=0x7f77d47c3c00, recv_ctx=0x7f77a700f000) at galera/src/replicator_smm.cpp:417 #11 0x00007f77d2cffaee in galera_recv (gh=0x7f77d63f7480, recv_ctx=0x7f77a700f000) at galera/src/wsrep_provider.cpp:244 #12 0x0000000000eb1ea3 in wsrep_replication_process (thd=0x7f77a700f000) at /ssd/ramesh/perf/pxc-perf/sql/wsrep_thd.cc:369 #13 0x0000000000e7e345 in start_wsrep_THD (arg=0xeb1d83 <wsrep_replication_process(THD*)>) at /ssd/ramesh/perf/pxc-perf/sql/mysqld.cc:7134 #14 0x00007f77df03adf5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f77dd2881ad in clone () from /lib64/libc.so.6