A PXC node receiving write statements will become unresponsive when another node enters or leaves the cluster, and innodb_thread_concurrency is non-zero.
General
Escalation
General
Escalation
Description
Environment
Debian 11. This repros on clusters running on hardware, and docker containers.
Activity
Show:

Kamil Holubicki February 26, 2025 at 10:28 AM
I think it is already fixed in 8.0.39/8.4.2 by .
, could you please verify?

Aaditya Dubey March 30, 2024 at 1:22 PM
Hi
Thank you for the report.
Verified as described:

Aaron DeForest March 26, 2024 at 1:07 AM
Here is the log from perctest1.err, for the third repro:

Aaron DeForest March 26, 2024 at 1:07 AM
The last log was from perctest1.err

Aaron DeForest March 26, 2024 at 1:05 AM
Here is the ending of the second repo, from perctest2.err
Details
Details
Assignee

Reporter

Labels
Needs QA
Yes
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created March 26, 2024 at 12:55 AM
Updated February 27, 2025 at 7:46 AM
Use a three node PXC, with the following profile:
Innodb_thread_concurrency is set to non-zero.
Thread-handling is set to pool-of-threads.
A node receives write commands.
The one node receiving the writes will stop processing writes, and become unresponsive during these scenarios:
Another node abruptly leaves the cluster. The cause can be network issues, or mysql dying.
Another node is gracefully stopping.
Another node is started and added to the working cluster.
The condition is triggered when the concurrent writes to the database are greater than or equal to the innodb_thread_concurrency setting, so long as innodb_thread_concurrency is greater than 1.
The bug reproductions and the production environments were all running Debian 11, using the 8.0.33-25-1.bullseye debian packages.
Bug Reproduction Overview
The bug reproductions were run using the host names: perctest1, perctest2, perctest3. The my.cnf configuration used for each node.
The mysql write traffic used for reproducing this bug was generated by using sysbench. Sysbench was run against the perctest1 node, using either the
oltp_insert.lua
oroltp_update_index.lua
sysbench scripts. This bug doesn't reproduce with sysbench scripts such asoltp_read_write.lua
. Running sysbench with 100 threads was used for reproducing this bug.Bug reproduction actual and expected results
After running one of the bug reproductions, below, the perctest1 node becomes unresponsive, and the sysbench test will never finish. Running commands like
show processlist;
will never return when they are run against perctest1, once the bug is triggered. At this point, mysqld on perctest1 needs to be killed and recovered before it will work properly again.For our test harness, when the sysbench is configured for 3 threads, perctest1 will still be responsive to administrative commands, but it will no longer process any of the write/DML statements from sysbench. Sysbench will never finish, and perctest1 will still need to be recovered.
The expected result is that this perctest1 will continue processing queries. The
show processlist;
command will return with the process list. The cluster will not need manual interventions and it will not need to be recovered.Bug reproduction setup
Start a three node, PXC cluster, naming the hosts: perctest1, perctest2, perctest3. Use the my.cnf configuration above.
Setup the sbtest database and user on the PXC cluster for the sysbench test. From mysql, run:
(Option) From the mysql prompt, set wsrep_debug for more logging.
Bug Reproduction 1 : node abruptly leaves
This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node abruptly leaves the PXC cluster. Start with the "Bug reproduction setup" above.
Start sysbench on perctest1. On perctest1, run:
Wait until the "Threads started!" line appears from the step above. Kill mysql on perctest3. On perctest3, run:
Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.
I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.
Bug Reproduction 2 : node is shutdown
This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node is shutdown. Start with the "Bug reproduction setup" above.
Start sysbench on perctest1. On perctest1, run:
Wait until the "Threads started!" line appears from the step above. From perctest3, run the following, or similar
Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.
I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.
Bug Reproduction 3 : node is started up
This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node in the cluster is started up. Start with the "Bug reproduction setup" above.
Log into perctest3, and gracefully shutdown mysql with a command similar to the following:
Log in a second time to perctest1, and start sysbench:
Wait until the "Threads started!" line appears from the step above. Log into perctest3, and start mysql by run the following, or similar
Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.
I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.