Full cluster crash after voting about "Cannot add or update a child row"

Description

We have a 3 nodes cluster with 8.0.32 and we suffered a full cluster crush twice in a week for the same reason. I will explain last:

After trying to join a node (c39b) using wsrep_sst_method=clone, the new member try to sync himself with the others using IST and then it started a voting process after an ERROR happend for a foreign key constraint fail:

All the rest of the nodes got stuck as well after the voting process and those started to abort connections too:

Assuming they all have gcs.vote_policy=0,it is assumed that the majority formed by nodes c39a and c39c should have been accepted, and it should not have caused a full cluster crash at all ( ):

Environment

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"

Activity

Pablo Higueras June 24, 2024 at 10:14 AM

Hi , we are updating our systems to 8.0.36, but won’t do it soon. I guess you can close the issue for now. Thanks for the information on voting mechanism improvements since version 8.0.33.

Kamil Holubicki June 24, 2024 at 10:06 AM

Hi , any update on this?

Pablo Higueras June 11, 2024 at 2:10 PM

Yes, it happened as well with xtrabackup. Maybe we can try to update the version since it seems to be some voting algorithm issue.

Kamil Holubicki June 11, 2024 at 1:56 PM

Hello , Thank you for your report. Unfortunately, without detailed steps to reproduce the issue, we are not able to fix it. The voting mechanism was improved since 8.0.33 (PXC-4365, PXC-4365).

Note that PXC does not support wsrep_sst_method=clone.

Does the issue happen when xtrabackup-v2 method is used?

Won't Do

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created June 11, 2024 at 1:41 PM
Updated June 24, 2024 at 11:28 AM
Resolved June 24, 2024 at 11:28 AM