PXC to PXC async replication broken with "Node has dropped from cluster" after pt-table-checksum

Description

_emphasized text_I've created two 3 node clusters with dbdeployer:
dbdeployer deploy --topology=pxc replication 5.7.25
dbdeployer deploy --topology=pxc replication --sandbox-directory=pxc2_msb_5_7_25 5.7.25

changed pxc_stric_mode to permissive and made unique server_id across all nodes in both clusters.

Created an dsns table with host port and msandbox user/password.
Created a simple innodb table with primary key and two columns.

After pt-t-c execution async replication between node1 of cluster1 and node1 of cluster2 was broken:
pt-table-checksum --defaults-file=node1/my.sandbox.cnf --recursion-method=dsn=h=127.0.0.1,P=26229,D=percona,t=dsns,u=msandbox,p=msandbox --tables="test.t"

If I skip problematic statements, replication continues to work.

If I execute replace statement from pt-t-c it's broken again:

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t` /*checksum table*/

Generates:

2019-06-20T08:27:44.712746Z 44 [Note] Slave I/O thread for channel '': connected to master 'msandbox@127.0.0.1:26226',replication started in log 'mysql-bin.000004' at position 14471 2019-06-20T08:27:44.720005Z 45 [Note] WSREP: Ready state reached 2019-06-20T08:27:44.720098Z 45 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000004' at position 13866, relay log './mysql-relay.000002' position: 1083 2019-06-20T08:27:44.720506Z 45 [Note] WSREP: set_query_id(), assigned new next trx id: 441 2019-06-20T08:27:44.720664Z 45 [Note] WSREP: consistency check: REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!9 9997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16) ), 0) AS crc FROM `test`.`t` 2019-06-20T08:27:44.720846Z 45 [Note] WSREP: Executing Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999 97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) with write-set (-1) and exec_mode: LOCAL_STATE in TO Isolation mode 2019-06-20T08:27:44.724394Z 45 [Note] WSREP: Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test ', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) with write-set (58) and exec_mode: TOTAL_ORDER replicated in TO Isolation mode 2019-06-20T08:27:44.724517Z 45 [Note] WSREP: wsrep: initiating TOI for write set (58) 2019-06-20T08:27:44.725167Z 45 [Note] WSREP: wsrep: completed TOI write set (58) 2019-06-20T08:27:44.725207Z 45 [Note] WSREP: Setting WSREPXid (InnoDB): 17379cf7-932a-11e9-8ca8-eafccc6e901e:58 2019-06-20T08:27:44.742909Z 45 [Note] WSREP: Completed query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999 97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`) replication with write-set (58) and exec_mode: TOTAL_ORDER in TO Isolation mode 2019-06-20T08:27:44.750033Z 45 [Note] WSREP: wsrep: replicating commit (-1) 2019-06-20T08:27:44.750093Z 45 [Warning] WSREP: SQL statement ((null)) was not replicated (thd: 45) 2019-06-20T08:27:44.750107Z 45 [Note] WSREP: commit action failed for reason: WSREP_TRX_FAIL THD: 45 Query: (null) 2019-06-20T08:27:44.750116Z 45 [Note] WSREP: conflict state: NO_CONFLICT 2019-06-20T08:27:44.750124Z 45 [Note] WSREP: --------- CONFLICT DETECTED -------- 2019-06-20T08:27:44.750132Z 45 [Note] WSREP: cluster conflict due to certification failure for threads: 2019-06-20T08:27:44.750142Z 45 [Note] WSREP: Victim thread: THD: 45, mode: local, state: executing, conflict: cert failure, seqno: -1 SQL: (null) 2019-06-20T08:27:44.750311Z 45 [Note] WSREP: cleanup transaction for LOCAL_STATE: (null) 2019-06-20T08:27:44.750342Z 45 [Warning] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'Deadlock found when trying to get lock; try restarting transaction', Error_code: 1213 2019-06-20T08:27:44.750356Z 45 [Note] WSREP: Apply Event failed (Reason: 1, Conflict-State: CERT_FAILURE) 2019-06-20T08:27:44.750375Z 45 [ERROR] Slave SQL for channel '': Node has dropped from cluster, Error_code: 1047 2019-06-20T08:27:44.750388Z 45 [Note] Slave SQL thread for channel '' exiting, replication stopped in log 'mysql-bin.000004' at position 13866

Regardless to the message about "Node has dropped from cluster" the node state after the error is still primary/synced

The error still happens even if I stop all nodes except for slave one on the second cluster.

Environment

None

AFFECTED CS IDs

257377

Attachments

3
  • 11 Jul 2019, 01:57 PM
  • 11 Jul 2019, 01:07 PM
  • 09 Jul 2019, 05:04 PM

Smart Checklist

Activity

Julia Vural March 4, 2025 at 9:28 PM

It appears that this issue is no longer being worked on, so we are closing it for housekeeping purposes. If you believe the issue still exists, please open a new ticket after confirming it's present in the latest release.

Kamil Holubicki January 24, 2023 at 3:40 PM

Sveta Smirnova January 24, 2023 at 12:43 PM

Percona Toolkit uses different format:

mysql> set @@binlog_format:='statement'; Query OK, 0 rows affected (0,00 sec)

See also https://jira.percona.com/browse/PXC-1144 and https://jira.percona.com/browse/PXC-4113. I believe we need clear solution for https://jira.percona.com/browse/PXC-4113

Kamil Holubicki January 24, 2023 at 10:12 AM

Both 5.7 and 8.0 reject binlog format STATEMENT

5.7

mysql> SET SESSION binlog_format=STATEMENT; ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER

8.0

mysql> SET SESSION binlog_format=STATEMENT; ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER

Is there anything else here to be done?

Sveta Smirnova September 25, 2019 at 1:04 AM

As agreed on the internal discussion we need to reject STATEMENT events on PXC level and clearly describe this limitation in the pt-table-checksum user manual.

Won't Do

Details

Assignee

Reporter

Time tracking

2d 5h logged

Priority

Smart Checklist

Created June 20, 2019 at 8:45 AM
Updated March 4, 2025 at 9:28 PM
Resolved March 4, 2025 at 9:28 PM

Flag notifications