LP #1491682: GTID replications fails to resume after master disconnect
Description
Environment
Attachments
- 30 Jul 2018, 12:32 PM
- 30 Jul 2018, 11:57 AM
- 30 Jul 2018, 11:57 AM
- 30 Jul 2018, 11:55 AM
is duplicated by
Smart Checklist
Activity
Julia Vural December 25, 2024 at 9:54 AM
Doesn't reproduce on newer builds.
Lalit Choudhary July 30, 2018 at 12:31 PMEdited
Percona server 5.7.22 also has similar issue. Log output attached.
Steps to reproduce:
Have Master - Slave
Enable Performance_schema instruments on slave.
/* Enable all instrumentation, but only for the slave SQL thread */
SET @sql_thread = (SELECT thread_id FROM performance_schema.threads WHERE name = 'thread/sql/slave_sql');
UPDATE performance_schema.threads SET instrumented = 'NO' WHERE thread_id != @sql_thread;
UPDATE performance_schema.setup_instruments SET enabled = 'YES', timed = 'YES';
UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name LIKE '%history_long' OR name LIKE '%current';
3. Start Bulk INSERT on MASTER, after some time stop MASTER and Start again and resume insert operation.
4. start slave on SLAVE (if it's not started)
5. Check SHOW SLAVE STATUS\G and you will see Seconds_Behind_Master will continue to increase / after some time it will halt at the constant value.
6. PS event_stages and statement information will show large wait time for System lock.
/* Let the SQL thread run for a minute or so */ /* Get stage and statement raw details */ SELECT event_id, event_name, timer_wait, lock_time FROM performance_schema.events_statements_history_long WHERE thread_id = @sql_thread;
SELECT event_name, timer_wait, nesting_event_id FROM performance_schema.events_stages_history_long WHERE thread_id = @sql_thread;
/* Get some thread wait summary data */
SELECT event_name, count_star, sum_timer_wait FROM performance_schema.events_waits_summary_by_thread_by_event_name WHERE thread_id = @sql_thread AND sum_timer_wait > 0 ORDER BY sum_timer_wait DESC;
Lalit Choudhary July 30, 2018 at 11:54 AMEdited
Hi Monty,
Thank you for the report.
This issue is similar to upstream MySQL Bug#72131
where the slave finds it very hard to catch up and there is a large wait time for System lock.
Output PS events logs are attached along with show processlist_; show engine_ innodb status\G show slave status\G output.
Tested with Percona Server 5.6.40
lpjirasync January 24, 2018 at 8:29 AM
**Comment from Launchpad by: monty solomon on: 05-09-2015 03:56:12
Attaching my.cnf and 001-basic-config.cnf
my.cnf includes 001-basic-config.cnf
lpjirasync January 24, 2018 at 8:29 AM
**Comment from Launchpad by: Valerii Kravchuk on: 04-09-2015 09:38:10
Please, send your my.cnf file content from slave.
Details
Assignee
UnassignedUnassignedReporter
lpjirasynclpjirasync(Deactivated)Labels
Affects versions
Priority
Low
Details
Details
Assignee
Reporter
Labels
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

**Reported in Launchpad by monty solomon last update 05-09-2015 06:52:00
The slave was unable to connect to the master due to network issues. Once the network issues were resolved the slave did not continue replication even though SHOW SLAVE STATUS listed "Waiting for master to send event"
We sometimes see similar issues when the master is restarted.
The issue appears to be specific to GTID replication.
Repeated executions of SHOW SLAVE STATUS\G contained the following excerpt.
Slave_IO_State: Waiting for master to send event
Connect_Retry: 15
Master_Log_File: bin.010862
Read_Master_Log_Pos: 2528103
Relay_Log_File: relay.001675
Relay_Log_Pos: 307
Relay_Master_Log_File: bin.010862
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 2528103
Relay_Log_Space: 657
Seconds_Behind_Master: 0
Master_Server_Id: 1156476169
Master_UUID: 9f9bf362-83e9-11e4-a45f-0acf4ba07bb2
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Retrieved_Gtid_Set: 9f9bf362-83e9-11e4-a45f-0acf4ba07bb2:1-1233067
Executed_Gtid_Set: 90fe53a8-1a31-11e5-b854-122ff10e879b:1-6,
9f9bf362-83e9-11e4-a45f-0acf4ba07bb2:1-1233067
After executing "STOP SLAVE; START SLAVE;" the slave resumed replication and Seconds_Behind_Master changed from 0 to 3510
Slave_IO_State: Waiting for master to send event
Connect_Retry: 15
Master_Log_File: bin.010864
Read_Master_Log_Pos: 11834054
Relay_Log_File: relay.001679
Relay_Log_Pos: 16023
Relay_Master_Log_File: bin.010863
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 15825
Relay_Log_Space: 24322598
Seconds_Behind_Master: 3510
Master_Server_Id: 1156476169
Master_UUID: 9f9bf362-83e9-11e4-a45f-0acf4ba07bb2
Slave_SQL_Running_State: System lock
Master_Retry_Count: 86400
Retrieved_Gtid_Set: 9f9bf362-83e9-11e4-a45f-0acf4ba07bb2:1-1239713
Executed_Gtid_Set: 90fe53a8-1a31-11e5-b854-122ff10e879b:1-6,
9f9bf362-83e9-11e4-a45f-0acf4ba07bb2:1-1233654