LP #1692745: All nodes on cluster crash on foreign key check
Description
Environment
Smart Checklist
Activity
Thomas Wilhelm September 4, 2018 at 9:25 AM
Hi Team,
we have percona cluster pxc 3-Node Cluster
percona-xtradb-cluster-server-5.7 5.7.21-29.26-1.stretch
Today we had the same issue.
Logs:
Tables:
Two of our three nodes stopped working an shut down.
Can you please check this?
lpjirasync January 12, 2018 at 9:30 AM
**Comment from Launchpad by: Krunal Bauskar on: 03-10-2017 06:18:30
Fixes for certfication key generation in the presence of foreign key
constraints (MW-369)
lpjirasync January 12, 2018 at 9:30 AM
**Comment from Launchpad by: Krunal Bauskar on: 03-10-2017 06:12:03
Bug was fixed as part of upstream refresh from Codership 5.6.36 and 5.7.18.
PXC recent release .6.37 and .7.19 carry the said fix.
Release: https://www.percona.com/blog/2017/09/20/percona-xtradb-cluster-5-6-37-26-21-is-now-available/
Release https://www.percona.com/blog/2017/09/22/percona-xtradb-cluster-5-7-19-29-22-now-available/
lpjirasync January 12, 2018 at 9:30 AM
**Comment from Launchpad by: Florian on: 19-07-2017 20:59:17
I Can also confirm the issue on PXC 5.6, though we "only" experience 1 out of 3 nodes to crash in a 3-way-master setup (same error though).
We were able to narrow the issue down to 2 consecutive transactions (first deletes from table that is referenced by foreign key which seem to be applied out of order by the 2 slaves. The transaction usually fails only on one node after 4 tries (which crashes the node then, because he thinks he is no longer consistent). The other slave node tries 1 or 2 times and then gets it right. I suspect some kind of out-of-order-committing, maybe there is a race condition in the code that makes sure transaction will be applied in order?
Btw, we double checked repl.commit_order to make sure it has been set to never apply out-of-order on any of the nodes...
Would be glad if someone from Percona could have a look if there is a configurable workaround until no patched version is released.
lpjirasync January 12, 2018 at 9:30 AM
**Comment from Launchpad by: Jericho Rivera on: 30-05-2017 03:56:49
Confirmed on PXC 5.6 as well:
Node2:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351
Node3:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351
Details
Details
Assignee
Reporter
Labels
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

**Reported in Launchpad by Marcelo Altmann last update 03-10-2017 06:18:30
On a 3 nodes pxc cluster, the whole cluster goes down when foreign key constrain fail.
2017-05-22 20:37:29 983 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 158, Error_code: 1451
2017-05-22 20:37:29 983 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 203
2017-05-22 20:37:29 983 [Warning] WSREP: Failed to apply app buffer: seqno: 203, status: 1
at galera/src/trx_handle.cpp:apply():351
How to reproduce:
simplified verion of https://jira.mariadb.org/browse/MDEV-12398
– Prepare -
CREATE DATABASE fk;
CREATE TABLE fk.parent (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, b varchar(10));
CREATE TABLE fk.child (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, pID INT, CONSTRAINT fk_parent FOREIGN KEY (pID) REFERENCES fk.parent (ID) ON DELETE NO ACTION ON UPDATE NO ACTION);
– Node1
cat << EOF > node1.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do
ID=\`\$MYSQL -BN -e "INSERT INTO fk.parent VALUES (NULL, 'test'); SELECT LAST_INSERT_ID();"\`
sleep 1
\$MYSQL -e "DELETE FROM fk.parent WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF
chmod +x node1.sh
– add delay
tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node1.sh
– Node2
cat << EOF > node2.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do
ID=\`\$MYSQL -BN -e "INSERT INTO fk.child SELECT NULL, ID FROM fk.parent ORDER BY ID DESC LIMIT 1; SELECT LAST_INSERT_ID();"\`
sleep 2
\$MYSQL -e "DELETE FROM fk.child WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF
chmod +x node2.sh
tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node2.sh
– Cleanup
DELETE FROM fk.child; DELETE FROM fk.parent;
– tested
5.7.17-13-57
5.6.35-81.0-56
my.cnf attached