Lock leak in PXC could result in deadlock
Description
Environment
It was also observed that the PXC node acting as a replica could enter into a deadlock due to lock leak, where the applier thread stucks on Waiting for slave mutex on exit
mysql> show processlist;
+----+-------------+-----------------+------+---------+------+-------------------------------------+-------------------+---------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
+----+-------------+-----------------+------+---------+------+-------------------------------------+-------------------+---------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 709 | wsrep: aborter idle | NULL | 709726 | 0 | 0 |
| 2 | system user | | NULL | Sleep | 150 | wsrep: committed write set (138302) | NULL | 135824 | 0 | 0 |
| 13 | root | localhost:35282 | test | Query | 195 | checking permissions | start slave | 195292 | 0 | 0 |
| 14 | system user | | NULL | Connect | 381 | Waiting for master to send event | NULL | 381007 | 0 | 0 |
| 15 | system user | | NULL | Connect | 240 | Waiting for slave mutex on exit | NULL | 240706 | 0 | 0 |
| 20 | root | localhost:49788 | test | Query | 11 | starting | show slave status | 11327 | 0 | 0 |
| 23 | root | localhost:58280 | test | Query | 0 | starting | show processlist | 0 | 0 | 0 |
+----+-------------+-----------------+------+---------+------+-------------------------------------+-------------------+---------+-----------+---------------+
7 rows in set (0.00 sec)
When observed in gdb, we can observe that the applier thread is waiting for the lock
(gdb) t
[Current thread is 32 (Thread 0x7ecc100ee640 (LWP 367680))]
(gdb) bt
#0 futex_wait (private=0, expected=2, futex_word=0x5f910a753278) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x5f910a753278, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007ecc18e98167 in ___pthread_mutex_lock (mutex=0x5f910a753278) at ./nptl/pthread_mutex_lock.c:145
#3 0x00005f91086e8fa1 in native_mutex_lock (mutex=0x5f910a753278) at /home/venki/work/pxc/merge/57/include/thr_mutex.h:91
#4 0x00005f91086e8fd5 in my_mutex_lock (mp=0x5f910a753278) at /home/venki/work/pxc/merge/57/include/thr_mutex.h:189
#5 0x00005f91086e912e in inline_mysql_mutex_lock (that=0x5f910a753278, src_file=0x5f91091f1938 "/home/venki/work/pxc/merge/57/sql/rpl_slave.cc", src_line=8037)
at /home/venki/work/pxc/merge/57/include/mysql/psi/mysql_thread.h:718
#6 0x00005f9108701e84 in handle_slave_sql (arg=0x5f910a6fd880) at /home/venki/work/pxc/merge/57/sql/rpl_slave.cc:8037
#7 0x00005f910878d8ea in pfs_spawn_thread (arg=0x7ecb8823e540) at /home/venki/work/pxc/merge/57/storage/perfschema/pfs.cc:2198
#8 0x00007ecc18e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9 0x00007ecc18f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
On printing the mutex to know the owner of the mutex, we can observe that it is waiting for thread owner 367680
(gdb) p *mutex
$5 = {__data = {__lock = 2, __count = 0, __owner = 367680, __nusers = 1, __kind = 3, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000@\234\005\000\001\000\000\000\003", '\000' <repeats 22 times>, __align = 2}
But the thread 367680 is none other than the applier thread itself
(gdb) t
[Current thread is 32 (Thread 0x7ecc100ee640 (LWP 367680))]
Steps to reproduce: same steps for https://perconadev.atlassian.net/browse/PXC-4404 can be used to reproduce this bug.
Deploy 8 nodes 5.7.44
Setup node1 as async replica of some external 5.7.44. Make sure to set wsrep_preordered=ON
Start write only sysbench using “test” db on node8
Start write only sysbench using “test2” db on external async source node.
Force node2 of the cluster to “pause” frequently with
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
relates to
Activity
Venkatesh Prasad July 29, 2024 at 3:17 PMEdited
Found the root cause: It is expected that function apply_event_and_update_pos
() must release the rli->data_lock()
Ref: here
@sql/rpl_slave.cc in exec_relay_log_event() function
===
exec_res= apply_event_and_update_pos(ptr_ev, thd, rli);
/*
Note: the above call to apply_event_and_update_pos executes
mysql_mutex_unlock(&rli->data_lock);
*/
===
However, there is a missing unlock() call when wsrep_preordered
encounters a replication error. This could result in lock leak causing the above mentioned deadlock.
The below patch fixes the above problem.
diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc
index f094f66ba0e..2aa27757ed6 100644
--- a/sql/rpl_slave.cc
+++ b/sql/rpl_slave.cc
@@ -4937,6 +4937,7 @@ apply_event_and_update_pos(Log_event** ptr_ev, THD* thd, Relay_log_info* rli)
{
WSREP_WARN("failed to cancel preordered write set");
}
+ mysql_mutex_unlock(&rli->data_lock);
DBUG_RETURN(SLAVE_APPLY_EVENT_AND_UPDATE_POS_APPLY_ERROR);
}
}
@@ -4947,6 +4948,7 @@ apply_event_and_update_pos(Log_event** ptr_ev, THD* thd, Relay_log_info* rli)
&data, 1, 1)) != WSREP_OK)
{
WSREP_ERROR("wsrep preordered collect failed: %d", err);
+ mysql_mutex_unlock(&rli->data_lock);
DBUG_RETURN(SLAVE_APPLY_EVENT_AND_UPDATE_POS_APPLY_ERROR);
}
@@ -4966,10 +4968,12 @@ apply_event_and_update_pos(Log_event** ptr_ev, THD* thd, Relay_log_info* rli)
thd->wsrep_po_cnt= 0;
wsrep_uuid_t source;
memcpy(source.data, thd->wsrep_po_sid.bytes, sizeof(source.data));
if ((err= wsrep->preordered_commit(wsrep, &thd->wsrep_po_handle,
&source, flags, 1, true)) != WSREP_OK)
{
WSREP_ERROR("Failed to commit preordered event: %d", err);
+ mysql_mutex_unlock(&rli->data_lock);
DBUG_RETURN(SLAVE_APPLY_EVENT_AND_UPDATE_POS_APPLY_ERROR);
}
}
Venkatesh Prasad July 29, 2024 at 9:36 AM
@Kamil Holubicki Same steps for https://perconadev.atlassian.net/browse/PXC-4404 can be used to reproduce this bug.
Deploy 8 nodes 5.7.44
Setup node1 as async replica of some external 5.7.44. Make sure to set wsrep_preordered=ON
Start write only sysbench using “test” db on node8
Start write only sysbench using “test2” db on external async source node.
Force node2 of the cluster to “pause” frequently with
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
KILL -SIGSTOP $PID_OF_NODE2; sleep 5; KILL -SIGCONT $PID_OF_NODE2;
Details
Assignee
Venkatesh PrasadVenkatesh PrasadReporter
Venkatesh PrasadVenkatesh PrasadNeeds QA
YesAffects versions
Priority
Medium
Details
Details
Assignee
Reporter
Needs QA
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

Lock leak could occur in PXC - 5.7.