Redo-log optimized DDL operation causing SST to fail
Description
Environment
causes
Smart Checklist
Activity
ethaniel 1 November 26, 2020 at 9:49 AM
I could fix this problem for me with:
sed -e 's/--lock-ddl /--lock-ddl-per-table /g' /bin/wsrep_sst_xtrabackup-v2
Although this is not the recommended way (the devs suggest that I clone the wsrep_sst_xtrabackup-v2 into another one), it didn't cause me any harm in updating the main file.
ethaniel 1 November 25, 2020 at 8:21 PM
Looks like this bug (DDL during SST causes a block) is still active in 5.7.31:
https://jira.percona.com/browse/PXC-3489
(sorry for double comment)
ethaniel 1 November 25, 2020 at 8:21 PM
Looks like this bug (DDL during SST causes a block) is still active in 5.7.31.
Krunal Bauskar February 13, 2018 at 9:18 AM
Existing behavior:
If BACKUP LOCKS are active then following DDLs are blocked
If DDLs is blocked then following DML are blocked.
Fix for this bug continue to retain the same behavior except that XB now invokes BACKUP LOCKs even for InnoDB TABLE.
This means if a user has DDL active on a node acting as DONOR then DDL and following DML will be blocked too. This shouldn't be confused with BACKUP LOCKS blocking DML. DML is blocked because of DDL (due to existing PXC blocking DDL dependency).
Ramesh Sivaraman February 7, 2018 at 10:19 AM
Able to reproduce the issue
Testcase
1) started node1
2) initiated sysbench
3) started test.sh in the loop
4) started node2 with xtrabackup-v2
2018-02-07T10:21:05.618828Z WSREP_SST: [INFO] Proceeding with SST.........
2018-02-07T10:21:05.644466Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2018-02-07T10:21:07.231497Z 0 [Note] WSREP: (99ec8c7d, 'tcp://127.0.0.1:13208') turning message relay requesting off
2018-02-07T10:21:41.766929Z 0 [Warning] WSREP: 0.0 (qaserver-06): State transfer to 1.0 (qaserver-06) failed: -22 (Invalid argument)
2018-02-07T10:21:41.766948Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort.
2018-02-07T10:21:41.767016Z 0 [Note] WSREP: gcomm: terminating thread
2018-02-07T10:21:41.767029Z 0 [Note] WSREP: gcomm: joining thread
2018-02-07T10:21:41.767082Z 0 [Note] WSREP: gcomm: closing backend
2018-02-07T10:21:41.828995Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2018-02-07T10:21:41.829642Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
2018-02-07T10:21:41.830297Z WSREP_SST: [ERROR] ******************************************************
2018-02-07T10:21:41.831052Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2018-02-07T10:21:41.836445Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '127.0.0.1' --datadir '/home/ramesh/workdir/pxc57/node2/' --defaults-file '' --defaults-group-suffix '.1' --parent '28342' '' : 2 (No such file or directory)
2018-02-07T10:21:41.836464Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2018-02-07T10:21:41.836470Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2018-02-07T10:21:41.836532Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2018-02-07T10:21:41.836547Z 0 [ERROR] Aborting
Details
Assignee
Krunal BauskarKrunal Bauskar(Deactivated)Reporter
Krunal BauskarKrunal Bauskar(Deactivated)Time tracking
3h loggedFix versions
Priority
High
Details
Details
Assignee
Reporter
Time tracking
Fix versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

Redo Optimized DDL operation (like CREATE INDEX with sorted index build) that doesn't flush REDO log immediately can cause SST to fail.
To reproduce:
Start a single node cluster
Create some initial workload to run DDL
sysbench --threads=1 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=0 --mysql-ignore-errors=all \ --mysql-user=root --mysql-socket=/tmp/n1.sock /home/krunal.bauskar/tools/sysbench/install/share/sysbench/oltp_insert.lua \ --mysql-db=test1 --tables=1 --table_size=2000000 prepare sysbench --threads=1 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=0 --mysql-ignore-errors=all \ --mysql-user=root --mysql-socket=/tmp/n1.sock /home/krunal.bauskar/tools/sysbench/install/share/sysbench/oltp_insert.lua \ --mysql-db=test2 --tables=1 --table_size=2000000 prepare
Run the below script to trigger the workload on node-1
#!/bin/bash echo "drop table if exists test1.sb1"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "create table test1.sb1 as select id,c from test1.sbtest1 where id < 150000;"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "create unique index ix on test1.sb1 (id)"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "show tables" | ./bin/mysql --user root -S /tmp/n1.sock -D test1 sleep 1 echo "drop table if exists test2.sb1"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "create table test2.sb1 as select id,c from test2.sbtest1 where id < 150000;"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "create unique index ix on test2.sb1 (id)"| ./bin/mysql --user root -S /tmp/n1.sock -D test echo "show tables" | ./bin/mysql --user root -S /tmp/n1.sock -D test2
$ while true; do bash test.sh; done
Start node-2 (with clean-data-dir so that it join using SST)
Backup action on donor (during SST) will fail with the following error
InnoDB: An optimized (without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet. PXB will not be able take a consistent backup. Retry the backup operation
SST on JOINER is aborted
2018-02-07T07:50:43.311902Z WSREP_SST: [INFO] Proceeding with SST......... 2018-02-07T07:50:43.351317Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete! 2018-02-07T07:50:45.094016Z 0 [Note] WSREP: (9850103a, 'tcp://10.30.7.164:5030') turning message relay requesting off 2018-02-07T07:51:00.019524Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 2018-02-07T07:51:00.020471Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log 2018-02-07T07:51:00.021387Z WSREP_SST: [ERROR] ****************************************************** 2018-02-07T07:51:00.022508Z WSREP_SST: [ERROR] Cleanup after exit with status:2 2018-02-07T07:51:00.028387Z 0 [Warning] WSREP: 1.0 (n1): State transfer to 0.0 (n2) failed: -22 (Invalid argument) 2018-02-07T07:51:00.028428Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort. 2018-02-07T07:51:00.028463Z 0 [Note] WSREP: gcomm: terminating thread