A PXC node receiving write statements will become unresponsive when another node enters or leaves the cluster, and innodb_thread_concurrency is non-zero.

General

Escalation

General

Escalation

Description

Use a three node PXC, with the following profile:

Innodb_thread_concurrency is set to non-zero.
Thread-handling is set to pool-of-threads.
A node receives write commands.

The one node receiving the writes will stop processing writes, and become unresponsive during these scenarios:

Another node abruptly leaves the cluster. The cause can be network issues, or mysql dying.
Another node is gracefully stopping.
Another node is started and added to the working cluster.

The condition is triggered when the concurrent writes to the database are greater than or equal to the innodb_thread_concurrency setting, so long as innodb_thread_concurrency is greater than 1.

The bug reproductions and the production environments were all running Debian 11, using the 8.0.33-25-1.bullseye debian packages.

Bug Reproduction Overview

The bug reproductions were run using the host names: perctest1, perctest2, perctest3. The my.cnf configuration used for each node.

The mysql write traffic used for reproducing this bug was generated by using sysbench. Sysbench was run against the perctest1 node, using either the oltp_insert.lua or oltp_update_index.lua sysbench scripts. This bug doesn't reproduce with sysbench scripts such as oltp_read_write.lua. Running sysbench with 100 threads was used for reproducing this bug.

Bug reproduction actual and expected results

After running one of the bug reproductions, below, the perctest1 node becomes unresponsive, and the sysbench test will never finish. Running commands like show processlist; will never return when they are run against perctest1, once the bug is triggered. At this point, mysqld on perctest1 needs to be killed and recovered before it will work properly again.

For our test harness, when the sysbench is configured for 3 threads, perctest1 will still be responsive to administrative commands, but it will no longer process any of the write/DML statements from sysbench. Sysbench will never finish, and perctest1 will still need to be recovered.

The expected result is that this perctest1 will continue processing queries. The show processlist; command will return with the process list. The cluster will not need manual interventions and it will not need to be recovered.

Bug reproduction setup

Start a three node, PXC cluster, naming the hosts: perctest1, perctest2, perctest3. Use the my.cnf configuration above.
Setup the sbtest database and user on the PXC cluster for the sysbench test. From mysql, run:

(Option) From the mysql prompt, set wsrep_debug for more logging.

Bug Reproduction 1 : node abruptly leaves

This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node abruptly leaves the PXC cluster. Start with the "Bug reproduction setup" above.

Start sysbench on perctest1. On perctest1, run:

Wait until the "Threads started!" line appears from the step above. Kill mysql on perctest3. On perctest3, run:

Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.

I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.

Bug Reproduction 2 : node is shutdown

This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node is shutdown. Start with the "Bug reproduction setup" above.

Start sysbench on perctest1. On perctest1, run:

Wait until the "Threads started!" line appears from the step above. From perctest3, run the following, or similar

Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.

I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.

Bug Reproduction 3 : node is started up

This is how to reproduce the bug where a node receiving writes becomes unresponsive, when another node in the cluster is started up. Start with the "Bug reproduction setup" above.

Log into perctest3, and gracefully shutdown mysql with a command similar to the following:

Wait until the "Threads started!" line appears from the step above. Log into perctest3, and start mysql by run the following, or similar

Now the bug is triggered. See the "Bug reproduction actual and expected results" section above.

I’ll attach the err log from perctest1, with wsrep_debug set to 'SERVER'.

Environment

Debian 11. This repros on clusters running on hardware, and docker containers.

Activity

Show:

Kamil Holubicki February 26, 2025 at 10:28 AM

I think it is already fixed in 8.0.39/8.4.2 by .

, could you please verify?

Aaditya Dubey March 30, 2024 at 1:22 PM

Thank you for the report.
Verified as described:

Aaron DeForest March 26, 2024 at 1:07 AM

Here is the log from perctest1.err, for the third repro:

Aaron DeForest March 26, 2024 at 1:07 AM

The last log was from perctest1.err

Aaron DeForest March 26, 2024 at 1:05 AM

Here is the ending of the second repo, from perctest2.err

Details

Assignee

Aaditya Dubey

Reporter

Aaron DeForest

Labels

reviewed

Needs QA

Yes

Affects versions

8.0.33-25 (Q2 2023)

8.0.35-27 (Q4 2023)

Priority

Medium

Smart Checklist

Created March 26, 2024 at 12:55 AM

Updated February 27, 2025 at 7:46 AM

Configure