Issues

Select view

List view

Detail view

Select search mode

Basic

JQL

Flow control flapping hangs the cluster
PXC-4453
Resolved issue: PXC-4453
Telemetry Ph1 - adjustments needed for PXC
PXC-4436
Resolved issue: PXC-4436
Garbd terminates SST script's children after SST is complete
PXC-4428
Resolved issue: PXC-4428
PXC 8.4.0
PXC-4408
Resolved issue: PXC-4408
Innodb semaphore wait timeout failure after upgrade from 8.0.34 to 8.0.35
PXC-4367
Resolved issue: PXC-4367
PXC node evicted when create function by user don't have super privilege and binary loggin is enabled
PXC-4362
Resolved issue: PXC-4362
Cluster state interruption with MDL BF-BF conflict and exec-mode:toi
PXC-4348
Resolved issue: PXC-4348
Statements executed in RSU mode generate local GTID events
PXC-4313
Resolved issue: PXC-4313
ALTER TABLE causes wsrep_cluster_status Disconnected
PXC-4298
Resolved issue: PXC-4298

9 of 9

Flow control flapping hangs the cluster

Done

General

Escalation

General

Escalation

Description

On PXC 8.0.36, a flapping flow control scenario may hang the cluster in a multi-writer environment. It also affects 5.7.44 and 5.7.25.

InnoDB status from the affected node shows threads in replicating state:

The receive queue does not show write-sets:

And flow control is still active:

Node 2 and 3 also shows flow control as active:

Killing the threads doesn't fix the issue, the node needs to be restarted to fix the cluster:

How to repeat:

Use the attached my.cnf to create a 3 nodes PXC 8.0.36 cluster.
Create the following tables:

On node 1, configure a 8M redo and strict durability settings:

On node 1, run the following command to produce a flow control flapping behavior:

And run the following workload:

On node 2, run the following commands:

Monitor the flow control on node 1, you may need adding more inserts in case the flapping happens between several seconds.

Since it’s a race condition, it may take seconds to minutes to trigger the bug.

Environment

None

AFFECTED CS IDs

CS0046107

Attachments

Linked issues

blocks

PXC-4407

PXC 8.0.37

Details

Assignee

Kamil Holubicki

Reporter

Juan Arruti

Labels

cs-tag-011

Needs QA

Yes

In progress time

6.25

Time tracking

No time logged1w 1d 2h remaining

Sprint

None

Fix versions

8.0.37-29 (Q2 2024)

8.4.0 (Q2 2024)

Affects versions

8.0.36-28 (Q1 2024)

Priority

Medium

Smart Checklist

Created July 13, 2024 at 2:24 AM

Updated December 23, 2024 at 11:40 AM

Resolved September 27, 2024 at 7:45 AM

Configure

Activity

Show:

Kamil Holubicki September 27, 2024 at 7:44 AM

, yes

Scott Hooper September 26, 2024 at 5:28 PM

Did this make it in the 8.0.37-29 released code?

Aaditya Dubey July 19, 2024 at 6:43 AM

Hi

Please find the steps below:

step1: Clone anydbver from

step2: Navigate to following path and add the my.cnf options:

step3: Add following options to pxc8-repl-gtid.cnf and save-exit

Step4: Deploy PXC 8.0.36 using anydbver by the following command:

step5: connect to node1 and just type mysql in the node1 terminal and you will be in

step6: Now navigate to my.cnf file in node1 and add following parameters and save-exit

step7: restart node1:

step8: Run following commends in node1 terminal in background:

step9:similar way run following set of command in node2:

step10: let it run for a few seconds to minutes and connect to node1 mysql client and observe flows by using following set of commands accordingly:

step11: Once you start seeing | wsrep_local_recv_queue | 0 | try killing those queries and also check INNODB status where you will stuck transactions:

Aaditya Dubey July 18, 2024 at 5:37 PM

I’m able to repeat the behaviour as described.

Kamil Holubicki July 18, 2024 at 8:59 AM

Hi , Unfortunately, I’m not able to reproduce. I tried for several hours and nothing

Here is my setup:

PXC 8.0.36
Use the config file attached n1.cnf (modify to node2 and node3 according to the comments around line 46
Start the cluster of 3 nodes
Start node-1-run.sh
Wait until db is set up and the workload starts
Start node-2-run.sh
Wait

I tried with a different number of insert workloads as suggested, but unfortunately, I’m not able to reproduce the issue. Maybe I’m doing something wrong?