Deadlock with concurrent START SLAVE and backup using PXB


Server could enter into a deadlock when START SLAVE is issued immediately when PXB process starts to take a backup of the server.

It looks locks are acquired in opposite order during

  1. START SLAVE and

  2. Query SELECT server_uuid, local, replication, storage_engines FROM performance_schema.log_status executed by PXB.

Processlist info:


mysql> show processlist; +----+--------------------+-----------------+------+---------+------+-----------------------------------------+--------------------------------------------------------------------------------------------+---------+-----------+---------------+ | Id | User               | Host            | db   | Command | Time | State                                   | Info                                                                                       | Time_ms | Rows_sent | Rows_examined | +----+--------------------+-----------------+------+---------+------+-----------------------------------------+--------------------------------------------------------------------------------------------+---------+-----------+---------------+ |  1 | system user        |                 | NULL | Sleep   |   71 | wsrep aborter idle                      | NULL                                                                                       |   71420 |         0 |             0 | |  2 | system user        |                 | NULL | Sleep   |   71 | innobase_commit_low (-1)                | NULL                                                                                       |   71420 |         0 |             0 | |  7 | event_scheduler    | localhost       | NULL | Daemon  |   69 | Waiting on empty queue                  | NULL                                                                                       |   68561 |         0 |             0 | | 11 | root               | localhost:33142 | NULL | Query   |   60 | Waiting for replica thread to start     | START REPLICA                                                                              |   60010 |         0 |             0 | | 14 | system user        | connecting host | NULL | Connect |   60 | Connecting to source                    | NULL                                                                                       |   60004 |         0 |             0 | | 15 | system user        | connecting host | NULL | Query   |   60 | Waiting for the next event in relay log | NULL                                                                                       |   60004 |         0 |             0 | | 16 | mysql.pxc.sst.user | localhost       | NULL | Query   |   53 | executing                               | SELECT server_uuid, local, replication, storage_engines FROM performance_schema.log_status |   53019 |         0 |             0 | | 18 | root               | localhost:45368 | NULL | Query   |    0 | init                                    | show processlist                                                                           |       0 |         0 |             0 | +----+--------------------+-----------------+------+---------+------+-----------------------------------------+--------------------------------------------------------------------------------------------+---------+-----------+---------------+ 8 rows in set (0.00 sec)

This issue was seen in PXC, so we could see query running as the user mysql.pxc.sst.user getting stuck for 53 seconds, and the START SLAVE being stuck from 60seconds

Stacktrace file:




  • 09 May 2023, 06:28 AM



Venkatesh Prasad May 9, 2023 at 8:19 AM

Venkatesh Prasad May 9, 2023 at 7:41 AM

Deadlock summary:

  • SQL thread is holding rli->run_lock and is waiting for s_synced inside wait_until_state(). This will return only on successful SST. But SST is stuck at waiting for results from log_status query.

  • IO thred waiting for mi->run_lock in handle_slave_io. But mi->run_lock is held by START SLAVE.

  • START SLAVE thread waiting for rli->run_lock(lock_cond_sql) in start_slave_thread() which is held by SQL thread.

  • LOG_STATUS is waiting for channel_map_lock->wrlock() but it is held by START SLAVE thread.

SQL thread
holds: rli->run_lock acquired at line no 7128 in
waits for: SST/PXB

IO thread:
waits for: mi->run_lock, held by START SLAVE

1. mi->run_lock, this was reacquired after starting IO thread, usually released in unlock_slave_threads() in the end of start_slave().
2. channel_map_lock held in start_slave_cmd() line no 747 in
waits for: rli->run_lock, held by SQL thread

holds: nothing
waits for: channel_map_lock, held by START SLAVE





Needs Review


Needs QA


Affects versions


Smart Checklist

Created May 8, 2023 at 11:05 AM
Updated March 6, 2024 at 8:41 PM
Resolved August 21, 2023 at 12:22 PM