slave server may has gaps in Executed_Gtid_Set when a special case happen

Description

 
Description:
First: In a slave server of master-slave replication system, a worker thread execute a trx has the follow error: "...Lock wait timeout exceeded; try restarting transaction, Error_code: 1205". Then the trx retry 10 times and cause the sql thread and worker thread exit.

Second: We execute show slave status\G and find the execute relay log pos is not consistent with the Executed_Gtid_Set.

Third: If we execute start slave to continue, we can see gaps in Executed_Gtid_Set.

At the end , we think the slave has lost some trxs.

slave configs:
log-slave-updates=1
slave-skip-errors=1007,1008,1050,1060,1061,1062,1068
slave_parallel_workers = 32
slave-parallel-type = LOGICAL_CLOCK
slave_preserve_commit_order = 1
relay_log_recovery = 0
innodb_flush_log_at_trx_commit = 1
How to repeat:
First: Make a master-slave server. Do the change master command:
(slave_session_1)
reset master;
stop slave;
reset salve;
reset slave all;
change master to Master_Host='127.0.0.1',Master_User='rpl',Master_Port=5518, Master_Password='123456',MASTER_AUTO_POSITION=1;
start slave;
(master_session)
drop database if exists abczyy_test;
create database abczyy_test;
create table abczyy_test.tb1(a int key, b int);

Second: Do the following cmds.
(master_session)
reset master;
(slave_session_1)
reset master;
stop slave;
reset slave;
start slave;
(master_session)
truncate table abczyy_test.tb1;
(slave_session_1)
stop slave sql_thread;
(master_session)
insert into abczyy_test.tb1(a,b) values(1,1);
insert into abczyy_test.tb1(a,b) values(2,2);
flush logs;
insert into abczyy_test.tb1(a,b) values(3,3);
insert into abczyy_test.tb1(a,b) values(4,4);
flush logs;
(slave_session_2)
start transaction;
insert into abczyy_test.tb1(a,b) values(2,2);
(slave_session_1)
start slave;
show slave status\G

Third: Wait until the sql thread and worker threads exit.

Environment

None

AFFECTED CS IDs

CS0017480

Activity

Aaditya Dubey 
March 15, 2023 at 8:58 AM

Hi ,

On it, Thanks!

Lalit Choudhary 
March 14, 2023 at 12:07 PM

  Could you please check if this issue still exists on 8.0.32? 

Venkatesh Prasad 
March 14, 2023 at 11:17 AM

   Hi! I was not able to reproduce this issue on .0.32.

Replication proceeded when I did START REPLICA after releasing the row locks (exit command) from slave_session_2, and no gaps were seen in the Executed_Gtid_Set.

Can you please check if this issue still exists on 8.0.32?

Lalit Choudhary 
April 29, 2021 at 2:07 PM

Thank you for test and report, Vinicius.

Lalit Choudhary 
April 29, 2021 at 2:05 PM

Upstream bug verified: https://bugs.mysql.com/bug.php?id=95064

Marking this open.

Details

Assignee

Reporter

Upstream Bug URL

Priority

Created April 27, 2021 at 9:00 PM
Updated March 6, 2024 at 10:37 AM