Prevent the replication coordinator thread getting stuck due to MASTER_DELAY while handling partial relay log transactions.

General

Escalation

General

Escalation

Description

Hi,

We're encountering a bug where sql_thread got stuck in Waiting until MASTER_DELAY seconds after master executed event for hours and doesn't apply any transaction.

The issue happens only when parallel replication (slave_parallel_workers > 1) and sql_delay is enabled on the replica. Disabling the parallel replication fixes the issue and sql_thread resumes the work as expected.

mysql> show global variables where Variable_name in ('slave_preserve_commit_order','slave_parallel_type','slave_parallel_workers');
+-----------------------------+---------------+
| Variable_name               | Value         |
+-----------------------------+---------------+
| slave_parallel_type         | LOGICAL_CLOCK |
| slave_parallel_workers      | 4             |
| slave_preserve_commit_order | ON            |
+-----------------------------+---------------+
3 rows in set (0.00 sec)

$ mysql -e"show slave status\G" | grep  'SQL_Delay'
                    SQL_Delay: 14400

The following output confirms the issue, after the relay log rotation the Slave_SQL_Running_State stuck in "Waiting until MASTER_DELAY seconds after master executed event". The replica is stuck even if Seconds_Behind_Master > SQL_Delay.

# mysql -e"show slave status\G" | grep -iE 'seconds|delay'
        Seconds_Behind_Master: 74795
                    SQL_Delay: 14400
          SQL_Remaining_Delay: 8752
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

The Relay_Master_Log_File and Exec_Master_Log_Pos don't change for hours.

while true; 
do echo "========================";
mysql -e"show slave status\G" | grep -iE 'relay_master_log_file|Exec_Master_Log_Pos'; 
sleep 5; 
done
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 86759722
=============================================

MySQL processlist showing the same issue.

mysql> show processlist;
+------+--------------+-------------------+------+---------+-------+----------------------------------------------------------------+------------------+-----------+---------------+
| Id   | User         | Host              | db   | Command | Time  | State                                                          | Info             | Rows_sent | Rows_examined |
+------+--------------+-------------------+------+---------+-------+----------------------------------------------------------------+------------------+-----------+---------------+
|    1 | system user  |                   | NULL | Connect |  8137 | Waiting for master to send event                               | NULL             |         0 |             0 |
|    2 | system user  |                   | NULL | Connect |  5964 | Waiting until MASTER_DELAY seconds after master executed event | NULL             |         0 |             0 |
|    3 | system user  |                   | NULL | Connect | 75111 | Waiting for an event from Coordinator                          | NULL             |         0 |             0 |
|    4 | system user  |                   | NULL | Connect | 75111 | Waiting for an event from Coordinator                          | NULL             |         0 |             0 |
|    5 | system user  |                   | NULL | Connect | 75111 | Waiting for an event from Coordinator                          | NULL             |         0 |             0 |
|    6 | system user  |                   | NULL | Connect | 75111 | Waiting for an event from Coordinator                          | NULL             |         0 |             0 |
...

From the InnoDB status output (show engine innodb status\G), we could see that sql_thread stuck in Waiting for an event from Coordinator state while keeping the transaction open for 6011 seconds and increasing.

---TRANSACTION 57479112090, ACTIVE 6011 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 3, OS thread handle 140093445318400, query id 17809197 Waiting for an event from Coordinator

The behavior is exactly the same as mentioned in the bug - https://jira.percona.com/browse/PS-3716

After disabling the parallel replication, it was working as expected.

mysql> show global variables where Variable_name in ('slave_preserve_commit_order','slave_parallel_type','slave_parallel_workers');
+-----------------------------+---------------+
| Variable_name               | Value         |
+-----------------------------+---------------+
| slave_parallel_type         | LOGICAL_CLOCK |
| slave_parallel_workers      | 0             |
| slave_preserve_commit_order | ON            |
+-----------------------------+---------------+
3 rows in set (0.00 sec)

SQL_thread started applying events and Seconds_Behind_Master started decreasing.

# while true;
> do echo "========================";
> mysql -e"show slave status\G" | grep -iE 'relay_master_log_file|Exec_Master_Log_Pos';
> sleep 5;
> done
========================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 984748121
========================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 1004479338
========================
        Relay_Master_Log_File: mysql-bin.008275
          Exec_Master_Log_Pos: 1024737695

Server and MySQL Configuration on the affected replica.

cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)

CPU : 16vCPU
RAM : 125G

Percona Server version
rpm -qa | grep -i percona-server
Percona-Server-shared-57-5.7.32-35.1.el7.x86_64
Percona-Server-shared-compat-57-5.7.32-35.1.el7.x86_64
Percona-Server-client-57-5.7.32-35.1.el7.x86_64
Percona-Server-devel-57-5.7.32-35.1.el7.x86_64
Percona-Server-57-debuginfo-5.7.32-35.1.el7.x86_64
Percona-Server-server-57-5.7.32-35.1.el7.x86_64


My.cnf file

[mysql]
port                                = 3306
socket                              = /var/lib/mysql/mysql.sock

[mysqld]
collation-server                    = utf8_unicode_ci
character-set-server                = utf8
sql-mode                            = "STRICT_ALL_TABLES,STRICT_TRANS_TABLES"
plugin-load-add                     = audit_log.so
user                                = mysql
default-storage-engine              = InnoDB
socket                              = /var/lib/mysql/mysql.sock
pid-file                            = /var/lib/mysql/mysql.pid
tmpdir                              = /tmp
explicit_defaults_for_timestamp     = 1
bind-address                        = 0.0.0.0
skip-name-resolve                   = ON
max-connect-errors                  = 1000000
sysdate-is-now                      = 1
datadir                             = /var/lib/mysql
expire-logs-days                    = 4
binlog-format                       = ROW
log_slave_updates
gtid-mode                           = ON
enforce-gtid-consistency            = ON
tmp-table-size                      = 32M
max-heap-table-size                 = 32M
query-cache-type                    = 0
query-cache-size                    = 0
thread-cache-size                   = 128
open-files-limit                    = 65535
table-definition-cache              = 1024
max_allowed_packet                  = 1G
net_buffer_length                   = 1M
thread_cache_size                   = 6000
innodb-flush-method                 = O_DIRECT
innodb-log-files-in-group           = 2
innodb-flush-log-at-trx-commit      = 2
innodb-file-per-table               = 1
innodb_page_cleaners                = 16
log-error                           = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes       = 1
slow-query-log-file                 = /var/lib/mysql/mysql-slow.log
master_info_repository              = TABLE
server-id                           = <ID>
report-host                         = <HOST>
local_infile                        = OFF
automatic_sp_privileges             = FALSE
old_passwords                       = 2
innodb-max-dirty-pages-pct          = 75
read_only                           = 1
super_read_only                     = 0
skip_ssl
log-bin                             = mysql-bin
max_binlog_size                     = 1G
max-connections                     = 5000
table_open_cache                    = 10000
innodb_adaptive_hash_index          = 0
long_query_time                     = 0
thread_pool_size                    = 32
innodb_purge_threads                = 8
innodb_write_io_threads             = 32
innodb_read_io_threads              = 32
innodb_flush_neighbors              = 0
innodb_thread_concurrency           = 32
log_output                          = file
slow_query_log                      = ON
long_query_time                     = 1
log_slow_rate_limit                 = 100
log_slow_rate_type                  = query
log_slow_verbosity                  = full
log_slow_admin_statements           = ON
log_slow_slave_statements           = ON
slow_query_log_always_write_time    = 1
slow_query_log_use_global_control   = all
innodb_monitor_enable               = all
max_user_connections                = 5000
max_connections                     = 5050
thread_cache_size                   = 4000
innodb_autoinc_lock_mode            = 2
innodb_sync_array_size              = 1
innodb_open_files                   = 2000
skip_symbolic_links
table_open_cache_instances          = 4
innodb_log_file_size                = 10737418240
sync_binlog                         = 0
slave_net_timeout                   = 250
innodb_lru_scan_depth               = 4096
audit_log_exclude_accounts          = "'user1'@'XX.XX.%.%','user2'@'XX.XX.%.%',.............."
audit_log_rotate_on_size            = 409600
innodb-io-capacity                  = 1000
innodb-io-capacity-max              = 2000
innodb_flushing_avg_loops           = 200
innodb-buffer-pool-size             = 86G
innodb_buffer_pool_instances        = 8
slave_rows_search_algorithms        = "INDEX_SCAN,HASH_SCAN"
binlog_group_commit_sync_no_delay_count = 8
slave_parallel_type                 = LOGICAL_CLOCK
slave_preserve_commit_order         = 1
slave_parallel_workers              = 4


# The master has following settings.
| binlog_group_commit_sync_delay          | 100                  |
| binlog_group_commit_sync_no_delay_count | 8                    |


# Enable sql_delay on replica after the replication setup.
stop slave;
CHANGE MASTER TO MASTER_DELAY=14400;
start slave;

Environment

None

AFFECTED CS IDs

PRB0040709, CS0016074

Attachments

master-my.cnf

01 Feb, 2021

slave_my.cnf

01 Feb, 2021

Activity

Show:

Venkatesh Prasad

April 7, 2021 at 1:08 PM

Problem
-------
The replication applier thread in a multi-threaded replica randomly gets
stuck in

Waiting until MASTER_DELAY seconds after master executed event

and doesn't apply any transaction for a long time when the replica is
configured with MASTER_DELAY.

Background
----------
After the bugfix of BUG#17326020 (commit: 3f6ed37), when the receiver
thread reconnects to its source using GTID auto positioning while in the
middle of a transaction, it leaves the partial transaction on the
relaylog as is and will fully retrieve the same transaction again. While
doing so, the receiver thread, as per GTID auto-position protocol will
write master's FORMAT_DESCRIPTION_EVENT with log_pos != 0 into the relay
log after each (re)connection.

This FORMAT_DESCRIPTION_EVENT shall look like below in the relay log.

===BEGIN===

at 4
#210215 16:18:05 server id 1 end_log_pos 123 CRC32 0x1120fc25 Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 16:18:05
This Format_description_event appears in a relay log and was generated by the slave thread.
at 123
#210215 16:18:05 server id 1 end_log_pos 154 CRC32 0xa29c149c Previous-GTIDs
[empty]
at 154
#700101 5:30:00 server id 1111 end_log_pos 0 CRC32 0x93c56c74 Rotate to mysql-bin.000001 pos: 4
at 201
#210215 14:19:18 server id 1111 end_log_pos 123 CRC32 0x25802c3f Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 14:19:18 at startup
ROLLBACK/!/;
BINLOG '
jjUqYA9XBAAAdwAAAHsAAAAAAAQANS43LjMyLTM1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACONSpgEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AT8sgCU=
'/!/;
at 320
#210215 16:18:05 server id 0 end_log_pos 367 CRC32 0xdd12fd05 Rotate to mysql-bin.000001 pos: 154
at 367
===END===

When the applier thread starts applying the FORMAT_DESCRIPTION_EVENT (at

position 201), it checks if it is in the middle of a group. If it is, it
will queue to the current worker a QUERY(ROLLBACK) event to make the
worker gracefully finish its work before starting to apply the new
(complete) copy of the group, and only after that it will let the MTS
coordinator to apply the ROTATE_LOG_EVENT (at position 320) in
synchronous mode.

Analysis
--------
In our tests, it was observed that the replication applier thread got
stuck when it was applying the Query_log_event (ROLLBACK) as part of
handling the partial relay log event. On further investigation, it was
found that the Query_log_event (for BEGIN/ROLLBACK) created by the
applier thread had the current time as its timestamp. As a result, the
delay function `slave_sleep()` slept for MASTER_DELAY seconds since the
event timestamp was less than the evaluated current timestamp.

Fix
—
Make the BEGIN/ROLLBACK query to execute with same timestamp as that of
the master's FORMAT_DESCRIPTION_EVENT in
`coord_handle_partial_binlogged_transaction()` function.

Alok Pathak

February 18, 2021 at 4:40 PM

Filed the upstream bug for same issue - https://bugs.mysql.com/bug.php?id=102647

Jira Bot

February 17, 2021 at 6:55 AM

To: @Alok Pathak
CC: @George Lorch

Hi, I'm jira-bot, Percona's Jira automation tool. I've detected that someone from
Percona has made an edit to the Summary field of an issue that you reported.

I'm not sentient (yet) so I'm not sure whether the person fixed a typo, changed
a few words, or completely rewrote the text. In any case, it is Percona Engineering's
intention to make the Summary and Description of an issue as accurate as possible
so that we're fixing the actual problem you're encountering, and to avoid
misunderstandings about symptoms and causes.

If the current Summary does not accurately reflect the problem you are reporting,
or if you feel the change was otherwise inappropriate in some way, please add a
new comment explaining things and we'll address it as soon as we can.

This message will be added only once per issue, regardless of how many times
the Summary is edited.

message-code:summary-edited

Venkatesh Prasad

February 15, 2021 at 12:07 PM

(edited)

Problem

The replication applier thread in a multi-threaded replica can randomly get stuck in

Waiting until MASTER_DELAY seconds after master executed event

and doesn't apply any transaction for a long time when the replica is configured with MASTER_DELAY.

Background

After the bugfix of BUG#17326020, when the receiver thread reconnects to its source using GTID auto positioning while in the middle of a transaction, it leaves the partial transaction on the relaylog as is and will fully retrieve the same transaction again. While doing so, the receiver thread, as per GTID auto-position protocol will write master's FORMAT_DESCRIPTION_EVENT with log_pos != 0 into the relay log after each (re)connection.

This FORMAT_DESCRIPTION_EVENT shall look like below in the relay log.

# at 4
#210215 16:18:05 server id 1 end_log_pos 123 CRC32 0x1120fc25 Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 16:18:05
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at 123
#210215 16:18:05 server id 1 end_log_pos 154 CRC32 0xa29c149c Previous-GTIDs
# [empty]
# at 154
#700101 5:30:00 server id 1111 end_log_pos 0 CRC32 0x93c56c74 Rotate to mysql-bin.000001 pos: 4
# at 201
#210215 14:19:18 server id 1111 end_log_pos 123 CRC32 0x25802c3f Start: binlog v 4, server v 5.7.32-35-debug-log created 210215 14:19:18 at startup
ROLLBACK/*!*/;
BINLOG '
jjUqYA9XBAAAdwAAAHsAAAAAAAQANS43LjMyLTM1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACONSpgEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AT8sgCU=
'/*!*/;
# at 320
#210215 16:18:05 server id 0 end_log_pos 367 CRC32 0xdd12fd05 Rotate to mysql-bin.000001 pos: 154
# at 367

When the applier thread starts applying the FORMAT_DESCRIPTION_EVENT (at position 201), it checks if it is in the middle of a group. If it is, it will queue to the current worker a QUERY(ROLLBACK) event to make the worker gracefully finish its work before starting to apply the new (complete) copy of the group, and only after that it will let the MTS coordinator to apply the ROTATE_LOG_EVENT (at position 320) in synchronous mode.

Analysis

In our tests, it was observed that the replication applier thread got stuck when it was applying the Query_log_event (ROLLBACK) as part of handling the partial relay log event. On further investigation, it was found that the Query_log_event (for BEGIN/ROLLBACK) created by the applier thread had the current time as its timestamp. As a result, the delay function `slave_sleep()` slept for MASTER_DELAY seconds since the event timestamp was less than the evaluated current timestamp.

Proposed Fix

Make the BEGIN/ROLLBACK query to execute with same timestamp as that of the master's FORMAT_DESCRIPTION_EVENT in `coord_handle_partial_binlogged_transaction()` function.

Venkatesh Prasad

February 15, 2021 at 12:03 PM

I was able to reproduce this bug with the following steps, using a debug build.

Slave:
CHANGE MASTER TO master_host='127.0.0.1', master_port=13001, master_user='root', master_auto_position=1;
CHANGE MASTER TO MASTER_DELAY=50;
SET GLOBAL DEBUG='d,stop_io_after_reading_gtid_log_event';
START SLAVE;

Master:
CREATE TABLE t1(i INT) ENGINE=InnoDB;

Here, by using the `stop_io_after_reading_gtid_log_event` debug point, we intentionally make the IO thread to go down after it reads the Gtid_log_event from the master's binary log, so that all on subsequent START SLAVE the receiver thread shall fetch master's FORMAT_DESCRIPTION_EVENT and cause the applier thread to wait for MASTER_DELAY seconds.

mysql> start slave io_thread; select sleep(1); show slave status\G
Query OK, 0 rows affected (0.00 sec)

+----------+
| sleep(1) |
+----------+
|        0 |
+----------+
1 row in set (1.00 sec)

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 13001
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 394
               Relay_Log_File: slave-relay-bin.000014
                Relay_Log_Pos: 454
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
...
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 329
              Relay_Log_Space: 1091
...
        Seconds_Behind_Master: 3280
...
             Master_Server_Id: 1111
                  Master_UUID: b04f1158-6f6a-11eb-9d24-74d83e29c093
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 30
          SQL_Remaining_Delay: 29
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event
           Master_Retry_Count: 10
...
           Retrieved_Gtid_Set: b04f1158-6f6a-11eb-9d24-74d83e29c093:1
            Executed_Gtid_Set: b04f1158-6f6a-11eb-9d24-74d83e29c093:1
                Auto_Position: 1
...
1 row in set (0.00 sec)

Resize issue view side panel

Done

Details

Assignee

Venkatesh Prasad(Deactivated)

Reporter

Alok Pathak

Labels

Upstream Bug URL

https://bugs.mysql.com/bug.php?id=102647

Needs Review

Yes

Time tracking

4d 7h 30m logged

Fix versions

8.0.23-14 (Q1 2021)

5.7.34-37 (Q2 2021)

Affects versions

5.7.32-35

Priority

Medium

Created January 14, 2021 at 1:29 PM

Updated March 6, 2024 at 10:45 AM

Resolved April 7, 2021 at 1:08 PM

Prevent the replication coordinator thread getting stuck due to MASTER_DELAY while handling partial relay log transactions.

Description

Environment

AFFECTED CS IDs

Attachments

Activity

Problem

Background

Proposed Fix

Details

Assignee

Reporter

Labels

Upstream Bug URL

Needs Review

Time tracking

Fix versions

Affects versions

Priority

Flag notifications

Something's gone wrong

Something's gone wrong