Modify wsrep_row_upd_check_foreign_constraints() to remove the check for DELETE
Description
Environment
AFFECTED CS IDs
Smart Checklist
Activity

Venkatesh Prasad August 4, 2020 at 12:27 PM
Merged the fix to .7 and .0.
: Unexpected ERROR 1205 modifying a child table in a FK relationship
Problem:
When deleting/updating from a child table in a FK relationship, if the
parent table's referenced rows are locked, the operation on a child table
failed with lock wait timeout error when the parent table is unlocked.
Analysis:
1. For DELETE query
This is the stacktrace of the DELETE query while waiting for the parent
rows to be unlocked.
However, the same DELETE query doesn't wait in .7.
In .7, when a row is being modified by an UPDATE/DELETE query, it even
updates the secondary index if there any index defined on the table. While
doing that, it first marks the row as deleted in the secondary index and
inserts into the index with the new value if it is an UPDATE query. This
updation of secondary index is necessary only for UPDATE queries and there
is no need to do the same for DELETE queries.
However, the same was not being done properly in .7 because of the
special handling in WSREP for the foreign keys in InnoDB. This made even
the DELETE query to inadvertently update the secondary index
(row_ins_check_foreign_constraint()) thereby trying to lock the parent
table and thus causing lock wait timeout error.
2. For UPDATE query
During Update/Delete, whenever we fail to acquire lock on a row, the
transaction status is to DB_LOCK_WAIT and transaction enters into wait
state waiting for the row to be unlocked; and when the row is unlocked, the
operation is retried.
However, in case of PXC, the temporary DB_LOCK_WAIT during foreign key
checks was being translated into a permanent DB_LOCK_WAIT_TIMEOUT because
of the fallthrough in the error handling code thereby causing the queyr to
fail without retrying the operation.
Fix:
1. The server is made to behave same way as of the .7 by removing the
check for DELETE query in `wsrep_row_upd_check_foreign_constraints()`
function which caused the thread to enter the wait.
2. DB_LOCK_WAIT error is now properly handled and the server retries the
operation.
Additionally, this patch
1. Adds new foreign key tests in galera suite invoking their respective
test from InnoDB with galera enabled.
2. Re-enables the disabled `galera_fk_multitable` and
`galera_toi_ddl_fk_insert` test cases.

Dany Davila July 22, 2020 at 7:55 PM
I was able to reproduce this defect in all versions with WSREP plugin enabled
I was not able to reproduce this defect on MySQL Async Replication:
All test were using two nodes cluster with WSREP plugin Disabled for MySQL Async Replication.
(1st Primary/Leader/Writer and 2nd acting as Replica/Read Only)

Jira Bot July 8, 2020 at 12:55 PM
To:
CC:
Hi, I'm jira-bot, Percona's Jira automation tool. I've detected that someone from
Percona has made an edit to the Summary field of an issue that you reported.
I'm not sentient (yet) so I'm not sure whether the person fixed a typo, changed
a few words, or completely rewrote the text. In any case, it is Percona Engineering's
intention to make the Summary and Description of an issue as accurate as possible
so that we're fixing the actual problem you're encountering, and to avoid
misunderstandings about symptoms and causes.
If the current Summary does not accurately reflect the problem you are reporting,
or if you feel the change was otherwise inappropriate in some way, please add a
new comment explaining things and we'll address it as soon as we can.
This message will be added only once per issue, regardless of how many times
the Summary is edited.
message-code:summary-edited
Details
Details
Assignee

Reporter

Labels
Time tracking
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

When deleting/updating from a child table in a FK relationship, if a parent table has referenced row(s) locked, the operation on a child table will get locked and fail as soon as the parent table is unlocked. For UPDATE case, it's important to touch a column, which is a part of a constraint.
Reproduction:
setup
Session 1
Session 2
Session 1
Session 2
The timing of the actions doesn't matter (only the order), as don't matter the settings for lock timeouts.
The issue does not reproduce on a regular Percona Server 5.7.29.
There are two problems here:
DELETE on a child table gets locked, which doesn't happen on a regular PS. Maybe there's a reason for that.
DELETE/UPDATE locked on a child table will always get ERROR 1205. Effectively, once the operation is locked in this case, it'll never finish without an error.
Could be introduced by this commit https://github.com/percona/percona-xtradb-cluster/commit/fd8b7c0f151b97980de3378f7f52c2bfe0236867
Reproduces on PXC 5.7.28 and PXC 8.0.18. Doesn't reproduce on PXC 5.7.14, where the DELETE doesn't block at all.