PXC DDL total order implementation issue for TRUNCATE TABLE

General

Escalation

General

Escalation

Description

I ran a script against a 3 node PXC cluster running PXC 5.7.26. The script will sequentially execute the below events.

truncate table t1 from node1
insert into table t1 from node1
insert into table t1 from node2

The script is failing at random time points with the below error.

I looked into the error log of all the nodes and observed the below error from node2 and node3 but not on node1 where the truncate table operation is issued from.

I verified the nodes and the table is there.

Steps to re produce:

Download PXC5.7.26
dbdeployer unpack PXC5.7.26
dbdeployer deploy --topology=pxc replication 5.7.26
Create the database and table using the schema definition in the attachment.
Run the attachedscript and wait for the issue.

NOTE: schema definition is commented in the same attachment of the script.

I have tested this on a regular master-master setup with 5.7.26 but it has no issues.

Environment

PXC 5.7.26 - 3 node cluster

Ubuntu 18.04

AFFECTED CS IDs

261986

Attachments

Smart Checklist

Activity

Show:

Sveta Smirnova July 29, 2019 at 11:27 AM

I do not agree this is a bug: TRUNCATE TABLE is DDL operation "DROP + CREATE".

Please check at https://galeracluster.com/library/documentation/schema-upgrades.html:

The main advantage of Total Order Isolation is its simplicity and predictability, which guarantees data consistency. Additionally, when using Total Order Isolation, you should take the following particularities into consideration:

From the perspective of certification, schema upgrades in Total Order Isolation never conflict with preceding transactions, given that they only execute after the cluster commits all preceding transactions. What this means is that the certification interval for schema changes using this method has a zero length. Therefore, schema changes will never fail certification and their execution is guaranteed.
Transactions that were in progress while the DDL was running and that involved the same database resource will get a deadlock error at commit time and will be rolled back.
The cluster replicates the schema change query as a statement before its execution. There is no way to know whether or not individual nodes succeed in processing the query. This prevents error checking on schema changes in Total Order Isolation.

So deadlocks are expected:

Transactions that were in progress while the DDL was running and that involved the same database resource will get a deadlock error at commit time and will be rolled back.

Regarding to

[ERROR] Slave SQL: Error 'Table 'DB261986.t' doesn't exist' on query. Default database: ''. Query: 'TRUNCATE DB261986.t', Error_code: 1146

It is expected too, because there is no guarantee that TRUNCATE TABLE will complete on all nodes in the certain amount of time. Since attached script runs TRUNCATE TABLE in a loop there is a chance that TRUNCATE will be still executing on nodes 2 or 3 when new TRUNCATE command comes from node 1.

Uday Varagani July 26, 2019 at 1:17 PM
Edited

I have verified the steps provided once again and could see some information in the logs when run in debug mode. I am failing to attach them here though they are pretty much less than 30K in size.

I am copying the snippet here from one of the nodes. This is from node2.

Not a Bug

Details

Assignee

Unassigned

Reporter

Uday Varagani(Deactivated)

Labels

PXC-5.7TOI

Time tracking

3h 15m logged

Affects versions

5.7.26-31.37

Priority

High

Smart Checklist

Created July 26, 2019 at 5:42 AM

Updated March 6, 2024 at 10:07 PM

Resolved December 20, 2021 at 8:10 PM

Configure

PXC DDL total order implementation issue for TRUNCATE TABLE

Description

Environment

AFFECTED CS IDs

Attachments

Smart Checklist

Activity

Sveta Smirnova July 29, 2019 at 11:27 AM

Uday Varagani July 26, 2019 at 1:17 PMEdited

Details

Assignee

Reporter

Labels

Time tracking

Affects versions

Priority

Smart Checklist

Uday Varagani July 26, 2019 at 1:17 PM
Edited