LP #1692745: All nodes on cluster crash on foreign key check

Description

**Reported in Launchpad by Marcelo Altmann last update 03-10-2017 06:18:30

On a 3 nodes pxc cluster, the whole cluster goes down when foreign key constrain fail.

2017-05-22 20:37:29 983 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 158, Error_code: 1451
2017-05-22 20:37:29 983 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 203
2017-05-22 20:37:29 983 [Warning] WSREP: Failed to apply app buffer: seqno: 203, status: 1
at galera/src/trx_handle.cpp:apply():351

How to reproduce:
simplified verion of https://jira.mariadb.org/browse/MDEV-12398

– Prepare -

CREATE DATABASE fk;
CREATE TABLE fk.parent (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, b varchar(10));
CREATE TABLE fk.child (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, pID INT, CONSTRAINT fk_parent FOREIGN KEY (pID) REFERENCES fk.parent (ID) ON DELETE NO ACTION ON UPDATE NO ACTION);
– Node1
cat << EOF > node1.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do

ID=\`\$MYSQL -BN -e "INSERT INTO fk.parent VALUES (NULL, 'test'); SELECT LAST_INSERT_ID();"\`
sleep 1
\$MYSQL -e "DELETE FROM fk.parent WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF

chmod +x node1.sh

– add delay
tc qdisc add dev eth0 root handle 1: netem delay 35ms

./node1.sh

– Node2
cat << EOF > node2.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do

ID=\`\$MYSQL -BN -e "INSERT INTO fk.child SELECT NULL, ID FROM fk.parent ORDER BY ID DESC LIMIT 1; SELECT LAST_INSERT_ID();"\`
sleep 2
\$MYSQL -e "DELETE FROM fk.child WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF

chmod +x node2.sh

tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node2.sh

– Cleanup
DELETE FROM fk.child; DELETE FROM fk.parent;

– tested
5.7.17-13-57
5.6.35-81.0-56

my.cnf attached

Environment

None

Smart Checklist

Activity

Show:

Thomas Wilhelm September 4, 2018 at 9:25 AM

Hi Team,

 

we have percona cluster pxc  3-Node Cluster 

percona-xtradb-cluster-server-5.7    5.7.21-29.26-1.stretch

Today we had the same issue.

Logs:

Tables:

Two of our three nodes stopped working an shut down.

 

Can you please check this?

 

 

 

lpjirasync January 12, 2018 at 9:30 AM

**Comment from Launchpad by: Krunal Bauskar on: 03-10-2017 06:18:30

  • Fixes for certfication key generation in the presence of foreign key
    constraints (MW-369)

lpjirasync January 12, 2018 at 9:30 AM

**Comment from Launchpad by: Krunal Bauskar on: 03-10-2017 06:12:03

Bug was fixed as part of upstream refresh from Codership 5.6.36 and 5.7.18.

PXC recent release .6.37 and .7.19 carry the said fix.

Release: https://www.percona.com/blog/2017/09/20/percona-xtradb-cluster-5-6-37-26-21-is-now-available/

Release https://www.percona.com/blog/2017/09/22/percona-xtradb-cluster-5-7-19-29-22-now-available/

lpjirasync January 12, 2018 at 9:30 AM

**Comment from Launchpad by: Florian on: 19-07-2017 20:59:17

I Can also confirm the issue on PXC 5.6, though we "only" experience 1 out of 3 nodes to crash in a 3-way-master setup (same error though).

We were able to narrow the issue down to 2 consecutive transactions (first deletes from table that is referenced by foreign key which seem to be applied out of order by the 2 slaves. The transaction usually fails only on one node after 4 tries (which crashes the node then, because he thinks he is no longer consistent). The other slave node tries 1 or 2 times and then gets it right. I suspect some kind of out-of-order-committing, maybe there is a race condition in the code that makes sure transaction will be applied in order?

Btw, we double checked repl.commit_order to make sure it has been set to never apply out-of-order on any of the nodes...

Would be glad if someone from Percona could have a look if there is a configurable workaround until no patched version is released.

lpjirasync January 12, 2018 at 9:30 AM

**Comment from Launchpad by: Jericho Rivera on: 30-05-2017 03:56:49

Confirmed on PXC 5.6 as well:

Node2:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351

Node3:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351

Done

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 12, 2018 at 9:29 AM
Updated March 6, 2024 at 11:16 PM
Resolved December 15, 2023 at 2:26 PM