PXC in a Desynced state after joiner being killed

Description

Scenario:

  • 1 bootstraped node,

  • 2 joiners stared at the same time. First was doing SST, second was waiting for a donor becoming available.

  • Joiner was killed during prepare state. Donor was hung in donor/desync state:

Environment

None

AFFECTED CS IDs

CS0026005

Activity

puneet.kaushik 
May 17, 2022 at 5:32 AM
(edited)

Bug fix verified in PXC 5.7.37 !

Kamil Holubicki 
April 14, 2022 at 2:05 PM

Hi ,

Probably the documentation of the above config params will need to be updated.

Kamil Holubicki 
April 13, 2022 at 12:17 PM
(edited)

How to test
2 node cluster. n1 having wsrep_provider_options="pc.weight=3"

  1. start n1. Load some data using sysbench (particularly useful for testing Case 2)

  2. start n2

Case 1:

3. When n1 log says 'Sleeping before data transfer for SST' partition the network

Case 2: 

3. When n1 log says 'Streaming the backup to joiner at...' which will happen just after  'Sleeping before data transfer for SST' partition the network. You need to partition it while SST transfer is in the middle.

4. Wait

 

Expected result

In both cases, the donor should cancel serving SST and go back to SYNCED state. For case 1 the timeout is the result of 'retry=N' and 'donor-timeout' combination (retry=30, donor-timeout=2 results in c.a 60 secs). For case 2 the timeout is 'sst-idle-timeout'

Joiner should abort after 'sst-idle-timeout' (default 120) sec

 

Partition network: iptables -P INPUT DROP && iptables -P OUTPUT DROP

Enable network: iptables -P INPUT ACCEPT && iptables -P OUTPUT ACCEPT

 

SST timeout control

We have the following parameters of [sst] section in the configuration file to control SST timeouts:
sockopt - if it contains retry=N, N will be used, otherwise 30
donor-timeout - (default: 10). The value of 'connect-timeout' on the donor side
joiner-timeout (sst-initial-timeout) - (default: 60). Time for joiner to wait for SST transfer start.
sst-idle-timeout - (default: 120). Timeout for transfer stuck. If no data is send or received in this time window, SST process is aborted.

Iwo Panowicz 
April 11, 2022 at 1:27 PM

The easiest way of reproducing that is to simulate network issues.

 

For instance, a single node cluster with `garbd --sst` used for testing.

 

Joiner:

garbd

./receive

 

 

Donor:

donor logs

 

Here, just after the backup was started it's enough to block any communication between the joiner and the donor, for instance iptables -P INPUT DROP && iptables -P OUTPUT DROP. It's important to do that in a way that disallows any communication (not even RST).

 

PXC in that case waits for the socat to finish anyhow (either with a success or a failure). It's even more interesting as Galera notices that the joiner already left the cluster, but it continues to stream the backup.

 

 

This feature is of PXC is also used for taking backups in PXC-Operator when 5.7 is used. For 8.0 the --recv-script is used, which also forces the garbd to reply Galera message when the backup is happening.

 

In some particular cases the socat can wait for the connection to expire for hours or days, but it really depends on the infrastructure and topology. 16 minutes in the above example is also very long time.

 

Done

Details

Assignee

Reporter

Labels

Time tracking

1d 4h 31m logged6h 18m remaining

Priority

Created August 25, 2020 at 8:16 AM
Updated May 27, 2024 at 12:08 PM
Resolved April 14, 2022 at 2:03 PM