slave server may has gaps in Executed_Gtid_Set when a special case happen
General
Escalation
General
Escalation
Description
Description: First: In a slave server of master-slave replication system, a worker thread execute a trx has the follow error: "...Lock wait timeout exceeded; try restarting transaction, Error_code: 1205". Then the trx retry 10 times and cause the sql thread and worker thread exit.
Second: We execute show slave status\G and find the execute relay log pos is not consistent with the Executed_Gtid_Set.
Third: If we execute start slave to continue, we can see gaps in Executed_Gtid_Set.
At the end , we think the slave has lost some trxs.
slave configs: log-slave-updates=1 slave-skip-errors=1007,1008,1050,1060,1061,1062,1068 slave_parallel_workers = 32 slave-parallel-type = LOGICAL_CLOCK slave_preserve_commit_order = 1 relay_log_recovery = 0 innodb_flush_log_at_trx_commit = 1 How to repeat: First: Make a master-slave server. Do the change master command: (slave_session_1) reset master; stop slave; reset salve; reset slave all; change master to Master_Host='127.0.0.1',Master_User='rpl',Master_Port=5518, Master_Password='123456',MASTER_AUTO_POSITION=1; start slave; (master_session) drop database if exists abczyy_test; create database abczyy_test; create table abczyy_test.tb1(a int key, b int);
Second: Do the following cmds. (master_session) reset master; (slave_session_1) reset master; stop slave; reset slave; start slave; (master_session) truncate table abczyy_test.tb1; (slave_session_1) stop slave sql_thread; (master_session) insert into abczyy_test.tb1(a,b) values(1,1); insert into abczyy_test.tb1(a,b) values(2,2); flush logs; insert into abczyy_test.tb1(a,b) values(3,3); insert into abczyy_test.tb1(a,b) values(4,4); flush logs; (slave_session_2) start transaction; insert into abczyy_test.tb1(a,b) values(2,2); (slave_session_1) start slave; show slave status\G
Third: Wait until the sql thread and worker threads exit.
Environment
None
AFFECTED CS IDs
CS0017480
Activity
Aaditya Dubey
March 15, 2023 at 8:58 AM
Hi ,
On it, Thanks!
Lalit Choudhary
March 14, 2023 at 12:07 PM
Could you please check if this issue still exists on 8.0.32?
Venkatesh Prasad
March 14, 2023 at 11:17 AM
Hi! I was not able to reproduce this issue on .0.32.
Replication proceeded when I did START REPLICA after releasing the row locks (exit command) from slave_session_2, and no gaps were seen in the Executed_Gtid_Set.
Can you please check if this issue still exists on 8.0.32?
Description:
First: In a slave server of master-slave replication system, a worker thread execute a trx has the follow error: "...Lock wait timeout exceeded; try restarting transaction, Error_code: 1205". Then the trx retry 10 times and cause the sql thread and worker thread exit.
Second: We execute show slave status\G and find the execute relay log pos is not consistent with the Executed_Gtid_Set.
Third: If we execute start slave to continue, we can see gaps in Executed_Gtid_Set.
At the end , we think the slave has lost some trxs.
slave configs:
log-slave-updates=1
slave-skip-errors=1007,1008,1050,1060,1061,1062,1068
slave_parallel_workers = 32
slave-parallel-type = LOGICAL_CLOCK
slave_preserve_commit_order = 1
relay_log_recovery = 0
innodb_flush_log_at_trx_commit = 1
How to repeat:
First: Make a master-slave server. Do the change master command:
(slave_session_1)
reset master;
stop slave;
reset salve;
reset slave all;
change master to Master_Host='127.0.0.1',Master_User='rpl',Master_Port=5518, Master_Password='123456',MASTER_AUTO_POSITION=1;
start slave;
(master_session)
drop database if exists abczyy_test;
create database abczyy_test;
create table abczyy_test.tb1(a int key, b int);
Second: Do the following cmds.
(master_session)
reset master;
(slave_session_1)
reset master;
stop slave;
reset slave;
start slave;
(master_session)
truncate table abczyy_test.tb1;
(slave_session_1)
stop slave sql_thread;
(master_session)
insert into abczyy_test.tb1(a,b) values(1,1);
insert into abczyy_test.tb1(a,b) values(2,2);
flush logs;
insert into abczyy_test.tb1(a,b) values(3,3);
insert into abczyy_test.tb1(a,b) values(4,4);
flush logs;
(slave_session_2)
start transaction;
insert into abczyy_test.tb1(a,b) values(2,2);
(slave_session_1)
start slave;
show slave status\G
Third: Wait until the sql thread and worker threads exit.