Issues

Select view

Select search mode

 

Flow control flapping hangs the cluster

Done

Description

On PXC 8.0.36, a flapping flow control scenario may hang the cluster in a multi-writer environment. It also affects 5.7.44 and 5.7.25.

InnoDB status from the affected node shows threads in replicating state:

The receive queue does not show write-sets:

And flow control is still active:

Node 2 and 3 also shows flow control as active:

Killing the threads doesn't fix the issue, the node needs to be restarted to fix the cluster:

How to repeat:

  1. Use the attached my.cnf to create a 3 nodes PXC 8.0.36 cluster.

  2. Create the following tables:

  1. On node 1, configure a 8M redo and strict durability settings:

  1. On node 1, run the following command to produce a flow control flapping behavior:

And run the following workload:

  1. On node 2, run the following commands:

Monitor the flow control on node 1, you may need adding more inserts in case the flapping happens between several seconds.

Since it’s a race condition, it may take seconds to minutes to trigger the bug.

Environment

None

AFFECTED CS IDs

CS0046107

Attachments

4

blocks

Details

Assignee

Reporter

Needs QA

Yes

In progress time

6.25

Time tracking

No time logged1w 1d 2h remaining

Sprint

Affects versions

Priority

Smart Checklist

Created July 13, 2024 at 2:24 AM
Updated December 23, 2024 at 11:40 AM
Resolved September 27, 2024 at 7:45 AM

Activity

Show:

Kamil Holubicki September 27, 2024 at 7:44 AM

, yes

Scott Hooper September 26, 2024 at 5:28 PM

Did this make it in the 8.0.37-29 released code?

Aaditya Dubey July 19, 2024 at 6:43 AM

Hi

Please find the steps below:

step1: Clone anydbver from

step2: Navigate to following path and add the my.cnf options:

step3: Add following options to pxc8-repl-gtid.cnf and save-exit

Step4: Deploy PXC 8.0.36 using anydbver by the following command:

step5: connect to node1 and just type mysql in the node1 terminal and you will be in

step6: Now navigate to my.cnf file in node1 and add following parameters and save-exit

step7: restart node1:

step8: Run following commends in node1 terminal in background:

step9:similar way run following set of command in node2:

step10: let it run for a few seconds to minutes and connect to node1 mysql client and observe flows by using following set of commands accordingly:

step11: Once you start seeing | wsrep_local_recv_queue     | 0          | try killing those queries and also check INNODB status where you will stuck transactions:

Aaditya Dubey July 18, 2024 at 5:37 PM

Hi

I’m able to repeat the behaviour as described.

Kamil Holubicki July 18, 2024 at 8:59 AM

Hi , Unfortunately, I’m not able to reproduce. I tried for several hours and nothing

Here is my setup:

  1. PXC 8.0.36

  2. Use the config file attached n1.cnf (modify to node2 and node3 according to the comments around line 46

  3. Start the cluster of 3 nodes

  4. Start node-1-run.sh

  5. Wait until db is set up and the workload starts

  6. Start node-2-run.sh

  7. Wait

 

I tried with a different number of insert workloads as suggested, but unfortunately, I’m not able to reproduce the issue. Maybe I’m doing something wrong?

Loading...