Modify to end mysqld process when the joiner fails during an SST
General
Escalation
General
Escalation
Description
Environment
None
Smart Checklist
Activity
Show:
Zsolt Parragi April 10, 2020 at 2:08 PM
Tested on 8.0, doesn't happen.
Duplicate
Details
Details
Assignee
Unassigned
UnassignedReporter
KennT
KennT(Deactivated)Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created April 8, 2020 at 10:14 AM
Updated March 6, 2024 at 9:38 PM
Resolved April 27, 2020 at 9:19 AM
Startup two nodes (node1 and node2). I am using an older PXB version, so the SST will fail with the error message:
2020-04-08T09:53:23.979792Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 2020-04-08T09:53:23.980952Z WSREP_SST: [ERROR] The xtrabackup version is 2.4.19. Needs xtrabackup-2.4.20 or higher to perform SST 2020-04-08T09:53:23.982050Z WSREP_SST: [ERROR] ******************************************************
This is fine. However, the process does not fully exit.
2020-04-08T09:53:26.987087Z 2 [Note] WSREP: gcomm: closed 2020-04-08T09:53:26.987118Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2020-04-08T09:53:26.987174Z 0 [Note] WSREP: Flow-control interval: [100, 100] 2020-04-08T09:53:26.987178Z 0 [Note] WSREP: Trying to continue unpaused monitor 2020-04-08T09:53:26.987181Z 0 [Note] WSREP: Received NON-PRIMARY. 2020-04-08T09:53:26.987201Z 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 0) 2020-04-08T09:53:26.987210Z 0 [Note] WSREP: Received self-leave message. 2020-04-08T09:53:26.987214Z 0 [Note] WSREP: Flow-control interval: [0, 0] 2020-04-08T09:53:26.987216Z 0 [Note] WSREP: Trying to continue unpaused monitor 2020-04-08T09:53:26.987219Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2020-04-08T09:53:26.987221Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0) 2020-04-08T09:53:26.987225Z 0 [Note] WSREP: RECV thread exiting 0: Success 2020-04-08T09:53:26.987289Z 2 [Note] WSREP: recv_thread() joined. 2020-04-08T09:53:26.987298Z 2 [Note] WSREP: Closing replication queue. 2020-04-08T09:53:26.987302Z 2 [Note] WSREP: Closing slave action queue. 2020-04-08T09:53:26.987307Z 2 [Note] WSREP: Closing aborting applier THD: 2 2020-04-08T09:53:26.987367Z 0 [Note] WSREP: wsrep running threads now: 0 2020-04-08T09:53:26.987654Z 0 [Note] WSREP: Waiting for active wsrep applier to exit 2020-04-08T09:53:26.987669Z 0 [Note] WSREP: Service disconnected. 2020-04-08T09:53:26.987672Z 0 [Note] WSREP: Waiting to close threads...... 2020-04-08T09:53:31.988541Z 0 [Note] WSREP: Some threads may fail to exit. 2020-04-08T09:53:31.988680Z 0 [Note] Binlog end 2020-04-08T09:53:31.988951Z 0 [Note] /home/kennt/dev/pxc/build-bin/bin/mysqld: Shutdown complete
The process is still hanging around waiting for the SST:
Thread 3 (Thread 0x7fcb95b6b700 (LWP 61652)): #0 0x00007fcb944ce449 in futex_wait (private=<optimized out>, expected=12, futex_word=0x563a5b14d1c4 <COND_wsrep_sst+36>) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=12, futex_word=0x563a5b14d1c4 <COND_wsrep_sst+36>) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_cond_destroy (cond=0x563a5b14d1a0 <COND_wsrep_sst>) at pthread_cond_destroy.c:54 #3 0x0000563a5915e041 in native_cond_destroy (cond=0x563a5b14d1a0 <COND_wsrep_sst>) at include/thr_cond.h:122 #4 0x0000563a5915e5b0 in inline_mysql_cond_destroy (that=0x563a5b14d1a0 <COND_wsrep_sst>) at include/mysql/psi/mysql_thread.h:1164 #5 0x0000563a59160b6e in clean_up_mutexes () at sql/mysqld.cc:1841 #6 0x0000563a591600d9 in mysqld_exit (exit_code=1) at sql/mysqld.cc:1544 #7 0x0000563a5916008f in unireg_abort (exit_code=1) at sql/mysqld.cc:1532 #8 0x0000563a59198e69 in wsrep_sst_prepare (msg=0x7fcb95b68620, thd=0x7fcb6c000b40) at sql/wsrep_sst.cc:876 #9 0x0000563a5918c846 in wsrep_view_handler_cb (app_ctx=0x563a5b14fe04 <key_FILE_galera_gvwstate>, recv_ctx=0x7fcb6c000b40, view=0x7fcb6c0123e0, state=0x0, state_len=0, sst_req=0x7fcb95b68620, sst_req_len=0x7fcb95b68628) at sql/wsrep_mysqld.cc:721 #10 0x00007fcb92c87a0f in galera::ReplicatorSMM::process_conf_change (this=0x563a5c744c50, recv_ctx=0x7fcb6c000b40, view_info=..., repl_proto=9, next_state=galera::Replicator::S_CONNECTED, seqno_l=1) at galera/src/replicator_smm.cpp:1628 #11 0x00007fcb92c60cd9 in galera::GcsActionSource::dispatch (this=0x563a5c745308, recv_ctx=0x7fcb6c000b40, act=..., exit_loop=@0x7fcb95b68b6b: false) at galera/src/gcs_action_source.cpp:135 #12 0x00007fcb92c61355 in galera::GcsActionSource::process (this=0x563a5c745308, recv_ctx=0x7fcb6c000b40, exit_loop=@0x7fcb95b68b6b: false) at galera/src/gcs_action_source.cpp:180 #13 0x00007fcb92c80d75 in galera::ReplicatorSMM::async_recv (this=0x563a5c744c50, recv_ctx=0x7fcb6c000b40) at galera/src/replicator_smm.cpp:408 #14 0x00007fcb92ca24ff in galera_recv (gh=0x563a5c6c6bc0, recv_ctx=0x7fcb6c000b40) at galera/src/wsrep_provider.cpp:244 #15 0x0000563a591a5bd2 in wsrep_replication_process (thd=0x7fcb6c000b40) at sql/wsrep_thd.cc:470 #16 0x0000563a5916ae29 in start_wsrep_THD (arg=0x563a591a5ad9 <wsrep_replication_process(THD*)>) at sql/mysqld.cc:7467 #17 0x0000563a59c5f44c in pfs_spawn_thread (arg=0x563a5c7b3ae0) at storage/perfschema/pfs.cc:2198 #18 0x00007fcb944c86db in start_thread (arg=0x7fcb95b6b700) at pthread_create.c:463 #19 0x00007fcb938b288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7fcb929ac700 (LWP 61645)): #0 0x00007fcb944ce9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x563a5c7a4868) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x563a5c7a47e0, cond=0x563a5c7a4840) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0x563a5c7a4840, mutex=0x563a5c7a47e0) at pthread_cond_wait.c:655 #3 0x0000563a5918aebf in native_cond_wait (cond=0x563a5c7a4840, mutex=0x563a5c7a47e0) at include/thr_cond.h:147 #4 0x0000563a5918af45 in my_cond_wait (cond=0x563a5c7a4840, mp=0x563a5c7a47e0) at include/thr_cond.h:202 #5 0x0000563a5918b41c in inline_mysql_cond_wait (that=0x563a5c7a4840, mutex=0x563a5c7a47e0, src_file=0x563a5a234760 "sql/wsrep_mysqld.cc", src_line=416) at include/mysql/psi/mysql_thread.h:1202 #6 0x0000563a5918ba2b in wsrep_pfs_instr_cb (type=WSREP_PFS_INSTR_TYPE_CONDVAR, ops=WSREP_PFS_INSTR_OPS_WAIT, tag=WSREP_PFS_INSTR_TAG_SERVICE_THD_CONDVAR, value=0x563a5c745268, alliedvalue=0x563a5c745258, ts=0x0) at sql/wsrep_mysqld.cc:416 #7 0x00007fcb92ad0fbb in gu::Lock::wait (this=0x7fcb929abd10, cond=...) at galerautils/src/gu_lock.hpp:112 #8 0x00007fcb92c5cdab in galera::ServiceThd::thd_func (arg=0x563a5c745240) at galera/src/galera_service_thd.cpp:37 #9 0x00007fcb944c86db in start_thread (arg=0x7fcb929ac700) at pthread_create.c:463 #10 0x00007fcb938b288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7fcb95cae0c0 (LWP 61641)): #0 0x00007fcb944ce9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x563a5b14d1c8 <COND_wsrep_sst+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x563a5b14d160 <LOCK_wsrep_sst>, cond=0x563a5b14d1a0 <COND_wsrep_sst>) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0x563a5b14d1a0 <COND_wsrep_sst>, mutex=0x563a5b14d160 <LOCK_wsrep_sst>) at pthread_cond_wait.c:655 #3 0x0000563a59196110 in native_cond_wait (cond=0x563a5b14d1a0 <COND_wsrep_sst>, mutex=0x563a5b14d160 <LOCK_wsrep_sst>) at include/thr_cond.h:147 #4 0x0000563a5919614f in my_cond_wait (cond=0x563a5b14d1a0 <COND_wsrep_sst>, mp=0x563a5b14d160 <LOCK_wsrep_sst>) at include/thr_cond.h:202 #5 0x0000563a591963e9 in inline_mysql_cond_wait (that=0x563a5b14d1a0 <COND_wsrep_sst>, mutex=0x563a5b14d160 <LOCK_wsrep_sst>, src_file=0x563a5a2361c0 "sql/wsrep_sst.cc", src_line=257) at include/mysql/psi/mysql_thread.h:1202 #6 0x0000563a59196c42 in wsrep_sst_wait () at sql/wsrep_sst.cc:257 #7 0x0000563a5918ec45 in wsrep_init_startup (first=true) at sql/wsrep_mysqld.cc:1239 #8 0x0000563a5916727b in init_server_components () at sql/mysqld.cc:4872 #9 0x0000563a59169118 in mysqld_main (argc=37, argv=0x563a5c6bce88) at sql/mysqld.cc:5877 #10 0x0000563a5915ddea in main (argc=10, argv=0x7ffc5b28f778) at sql/main.cc:32 #11 0x00007fcb937b2b97 in __libc_start_main (main=0x563a5915ddca <main(int, char**)>, argc=10, argv=0x7ffc5b28f778, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc5b28f768) at ../csu/libc-start.c:310 #12 0x0000563a5915dcea in _start ()