LP #1734312: WSREP: FSM: no such a transition COMMITTING -> REPLICATING or REPLICATING -> REPLICATING in shift_to call

Description

**Reported in Launchpad by Sveta Smirnova last update 05-12-2017 02:46:00

PXC crashes when executes events, calling stored routine with BEGIN TRANSACTION ... COMMIT block in parallel.

How to repeat:

1. cd YOUR_PXC_INSTALLATION_DIR/mysql-test
2. ./mtr --suite=galera_3nodes --mysqld=--wsrep-provider=`pwd`/../lib/libgalera_smm.so --start galera_certification_ccc &
3. alias mysqlmtr='../bin/mysql -h127.0.0.1 -P13000 -uroot'
4. mysqlmtr -e "set global event_scheduler=1"
5. for i in `seq 1 1 100`; do mysqlmtr -P13000 test -e "create table t_$i(id int not null auto_increment primary key, f1 int) engine=innodb"; mysqlmtr -P13000 test --delimiter='|' -e "create procedure p_$i() begin declare i int; start transaction; set i = 0; while i < 100 do insert into t_$i(f1) values;set i = i+1;end while;commit;end"; mysqlmtr -P13000 test --delimiter='|' -e "create event ev_$i on schedule every 1 minute do call p_$i()"; done
6. Wait for a crash

2017-11-24T10:28:23.468528Z 313 [Note] Event Scheduler: scheduler thread started with id 313
2017-11-24T10:28:27.481413Z 318 [ERROR] WSREP: FSM: no such a transition COMMITTING -> REPLICATING
10:28:27 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=3
max_threads=152
thread_count=6
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61228 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f12e400a790
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f133d4adc0f thread_stack 0x40000
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(my_print_stacktrace+0x2e)[0x103684e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(handle_fatal_signal+0x4a1)[0x959511]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f134fd01390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f134dd9b428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f134dd9d02a]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2+0x1ae)[0x7f134925224e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9replicateEPNS_9TrxHandleEP14wsrep_trx_meta+0xbb)[0x7f1349246bfb]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(galera_replicate+0xfb)[0x7f134925a09b]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z15wsrep_replicateP3THD+0x9eb)[0xefae5b]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z14ha_prepare_lowP3THDb+0x76)[0x9c8936]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z15ha_commit_transP3THDbb+0x180)[0x9c7fb0]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z12trans_commitP3THD+0x56)[0xecf826]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x30f8)[0xe2c7b8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x5d)[0xdb4bdd]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x35c)[0xdb617c]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0x98)[0xdb6cf8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x268)[0xdb7a58]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head7executeEP3THDb+0x76a)[0xdb28aa]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x70e)[0xdb330e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x48b2)[0xe2df72]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x5d)[0xdb4bdd]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x35c)[0xdb617c]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0x98)[0xdb6cf8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x268)[0xdb7a58]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head7executeEP3THDb+0x76a)[0xdb28aa]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x70e)[0xdb330e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN14Event_job_data7executeEP3THDb+0x6bf)[0xef4b7f]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN19Event_worker_thread3runEP3THDP28Event_queue_element_for_exec+0x174)[0xfa3074]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(event_worker_thread+0x4f)[0xfa313f]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(pfs_spawn_thread+0x1b4)[0x10a9164]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f134fcf76ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f134de6d3dd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f12e001f1d0): commit
Connection ID (thread ID): 318
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file
safe_process[1046]: Child process: 1047, killed by signal: 6

Environment

None

AFFECTED CS IDs

211388

Smart Checklist

Activity

Show:

Krunal Bauskar January 30, 2018 at 3:20 AM

Yes. It was shipped with 5.7.20.

lpjirasync January 14, 2018 at 7:22 PM

**Comment from Launchpad by: Krunal Bauskar on: 05-12-2017 02:45:59

commit 12a9e13401b4d9e7b781034eddf04ffe957c07d0
Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date: Wed Nov 29 17:17:22 2017 +0530

  • PXC#806: Early read of query-id result in stale trx handle assignment.

  • During an execution of a Stored Procedure (SP) as a background
    action (say through event scheduler), query-id get assigned at
    later stage, before PXC captures it to derive wsrep_next_trx_id.

  • PXC uses this wsrep_next_trx_id as unique id to maintain
    TrxHandle and thread mapping.

  • This early capture of query-id resulted in assigning an invalid
    wsrep_next_trx_id (read 0). Problem starts if multiple threads
    gets the same invalid (read 0) id as their unique id to look
    for TrxHandle that eventually cause sharing of TrxHandle.


Solution:
--------

  • Fixed by ensuring that when PXC reads query_id, it is a valid id.
    If not, then a valid query_id is generated.
    (MySQL flow anyway generates it but at later stage).

  • We could have also corrected it by skipping setting of
    wsrep_next_trx_id at the said stage and waiting for MySQL to
    generate it but that means a drill-down code inside
    the main logic. So keeping it at outer layer
    makes it more manageable.

lpjirasync January 14, 2018 at 7:22 PM

**Comment from Launchpad by: Sveta Smirnova on: 24-11-2017 12:07:06

Actually BEGIN TRANSACTION ... COMMIT is not required. Crash is repeatable with:

for i in `seq 1 1 100`; do mysqlmtr -P13000 test -e "create table t_$i(id int not null auto_increment primary key, f1 int) engine=innodb"; mysqlmtr -P13000 test --delimiter='|' -e "create procedure p_$i() begin declare i int; set i = 0; while i < 100 do insert into t_$i(f1) values;set i = i+1;end while;end"; mysqlmtr -P13000 test --delimiter='|' -e "create event ev_$i on schedule every 1 minute do call p_$i()"; done

lpjirasync January 14, 2018 at 7:22 PM

**Comment from Launchpad by: Sveta Smirnova on: 24-11-2017 11:54:14

Bug is not repeatable with version 5.6

Done

Details

Assignee

Reporter

Fix versions

Priority

Smart Checklist

Created January 14, 2018 at 7:22 PM
Updated March 6, 2024 at 10:36 PM
Resolved January 25, 2018 at 4:06 AM