PXC to PXC async replication broken with "Node has dropped from cluster" after pt-table-checksum
Description
Environment
AFFECTED CS IDs
Attachments
- 11 Jul 2019, 01:57 PM
- 11 Jul 2019, 01:07 PM
- 09 Jul 2019, 05:04 PM
Smart Checklist
Activity
Julia Vural March 4, 2025 at 9:28 PM
It appears that this issue is no longer being worked on, so we are closing it for housekeeping purposes. If you believe the issue still exists, please open a new ticket after confirming it's present in the latest release.
Kamil Holubicki January 24, 2023 at 3:40 PM
Thanks, so we need:
Sveta Smirnova January 24, 2023 at 12:43 PM
@Kamil Holubicki
Percona Toolkit uses different format:
mysql> set @@binlog_format:='statement';
Query OK, 0 rows affected (0,00 sec)
See also https://jira.percona.com/browse/PXC-1144 and https://jira.percona.com/browse/PXC-4113. I believe we need clear solution for https://jira.percona.com/browse/PXC-4113
Kamil Holubicki January 24, 2023 at 10:12 AM
Both 5.7 and 8.0 reject binlog format STATEMENT
5.7
mysql> SET SESSION binlog_format=STATEMENT;
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER
8.0
mysql> SET SESSION binlog_format=STATEMENT;
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER
@Sveta Smirnova @Nickolay Ihalainen
Is there anything else here to be done?
Sveta Smirnova September 25, 2019 at 1:04 AM
As agreed on the internal discussion we need to reject STATEMENT events on PXC level and clearly describe this limitation in the pt-table-checksum user manual.
Details
Assignee
UnassignedUnassignedReporter
Nickolay IhalainenNickolay Ihalainen(Deactivated)Time tracking
2d 5h loggedAffects versions
Priority
Medium
Details
Details
Assignee
Reporter
Time tracking
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

_emphasized text_I've created two 3 node clusters with dbdeployer:
dbdeployer deploy --topology=pxc replication 5.7.25
dbdeployer deploy --topology=pxc replication --sandbox-directory=pxc2_msb_5_7_25 5.7.25
changed pxc_stric_mode to permissive and made unique server_id across all nodes in both clusters.
Created an dsns table with host port and msandbox user/password.
Created a simple innodb table with primary key and two columns.
After pt-t-c execution async replication between node1 of cluster1 and node1 of cluster2 was broken:
pt-table-checksum --defaults-file=node1/my.sandbox.cnf --recursion-method=dsn=h=127.0.0.1,P=26229,D=percona,t=dsns,u=msandbox,p=msandbox --tables="test.t"
If I skip problematic statements, replication continues to work.
If I execute replace statement from pt-t-c it's broken again:
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t` /*checksum table*/
Generates:
2019-06-20T08:27:44.712746Z 44 [Note] Slave I/O thread for channel '': connected to master 'msandbox@127.0.0.1:26226',replication started in log 'mysql-bin.000004' at position 14471 2019-06-20T08:27:44.720005Z 45 [Note] WSREP: Ready state reached 2019-06-20T08:27:44.720098Z 45 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000004' at position 13866, relay log './mysql-relay.000002' position: 1083 2019-06-20T08:27:44.720506Z 45 [Note] WSREP: set_query_id(), assigned new next trx id: 441 2019-06-20T08:27:44.720664Z 45 [Note] WSREP: consistency check: REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!9 9997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16) ), 0) AS crc FROM `test`.`t` 2019-06-20T08:27:44.720846Z 45 [Note] WSREP: Executing Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999 97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) with write-set (-1) and exec_mode: LOCAL_STATE in TO Isolation mode 2019-06-20T08:27:44.724394Z 45 [Note] WSREP: Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test ', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) with write-set (58) and exec_mode: TOTAL_ORDER replicated in TO Isolation mode 2019-06-20T08:27:44.724517Z 45 [Note] WSREP: wsrep: initiating TOI for write set (58) 2019-06-20T08:27:44.725167Z 45 [Note] WSREP: wsrep: completed TOI write set (58) 2019-06-20T08:27:44.725207Z 45 [Note] WSREP: Setting WSREPXid (InnoDB): 17379cf7-932a-11e9-8ca8-eafccc6e901e:58 2019-06-20T08:27:44.742909Z 45 [Note] WSREP: Completed query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999 97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) replication with write-set (58) and exec_mode: TOTAL_ORDER in TO Isolation mode 2019-06-20T08:27:44.750033Z 45 [Note] WSREP: wsrep: replicating commit (-1) 2019-06-20T08:27:44.750093Z 45 [Warning] WSREP: SQL statement ((null)) was not replicated (thd: 45) 2019-06-20T08:27:44.750107Z 45 [Note] WSREP: commit action failed for reason: WSREP_TRX_FAIL THD: 45 Query: (null) 2019-06-20T08:27:44.750116Z 45 [Note] WSREP: conflict state: NO_CONFLICT 2019-06-20T08:27:44.750124Z 45 [Note] WSREP: --------- CONFLICT DETECTED -------- 2019-06-20T08:27:44.750132Z 45 [Note] WSREP: cluster conflict due to certification failure for threads: 2019-06-20T08:27:44.750142Z 45 [Note] WSREP: Victim thread: THD: 45, mode: local, state: executing, conflict: cert failure, seqno: -1 SQL: (null) 2019-06-20T08:27:44.750311Z 45 [Note] WSREP: cleanup transaction for LOCAL_STATE: (null) 2019-06-20T08:27:44.750342Z 45 [Warning] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'Deadlock found when trying to get lock; try restarting transaction', Error_code: 1213 2019-06-20T08:27:44.750356Z 45 [Note] WSREP: Apply Event failed (Reason: 1, Conflict-State: CERT_FAILURE) 2019-06-20T08:27:44.750375Z 45 [ERROR] Slave SQL for channel '': Node has dropped from cluster, Error_code: 1047 2019-06-20T08:27:44.750388Z 45 [Note] Slave SQL thread for channel '' exiting, replication stopped in log 'mysql-bin.000004' at position 13866
Regardless to the message about "Node has dropped from cluster" the node state after the error is still primary/synced
The error still happens even if I stop all nodes except for slave one on the second cluster.