PXC to PXC async replication broken with "Node has dropped from cluster" after pt-table-checksum

General

Escalation

General

Escalation

Description

_emphasized text_I've created two 3 node clusters with dbdeployer:
dbdeployer deploy --topology=pxc replication 5.7.25
dbdeployer deploy --topology=pxc replication --sandbox-directory=pxc2_msb_5_7_25 5.7.25

changed pxc_stric_mode to permissive and made unique server_id across all nodes in both clusters.

Created an dsns table with host port and msandbox user/password.
Created a simple innodb table with primary key and two columns.

After pt-t-c execution async replication between node1 of cluster1 and node1 of cluster2 was broken:
pt-table-checksum --defaults-file=node1/my.sandbox.cnf --recursion-method=dsn=h=127.0.0.1,P=26229,D=percona,t=dsns,u=msandbox,p=msandbox --tables="test.t"

If I skip problematic statements, replication continues to work.

If I execute replace statement from pt-t-c it's broken again:

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t` /*checksum table*/

Generates:

2019-06-20T08:27:44.712746Z 44 [Note] Slave I/O thread for channel '': connected to master 'msandbox@127.0.0.1:26226',replication started in log 'mysql-bin.000004' at position 14471
2019-06-20T08:27:44.720005Z 45 [Note] WSREP: Ready state reached
2019-06-20T08:27:44.720098Z 45 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000004' at position 13866, relay log './mysql-relay.000002' position:
1083
2019-06-20T08:27:44.720506Z 45 [Note] WSREP: set_query_id(), assigned new next trx id: 441
2019-06-20T08:27:44.720664Z 45 [Note] WSREP: consistency check: REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!9
9997*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)
), 0) AS crc FROM `test`.`t`
2019-06-20T08:27:44.720846Z 45 [Note] WSREP: Executing Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999
97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)),
 0) AS crc FROM `test`.`t`) with write-set (-1) and exec_mode: LOCAL_STATE in TO Isolation mode
2019-06-20T08:27:44.724394Z 45 [Note] WSREP: Query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/ 'test
', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)), 0) AS crc
 FROM `test`.`t`) with write-set (58) and exec_mode: TOTAL_ORDER replicated in TO Isolation mode
2019-06-20T08:27:44.724517Z 45 [Note] WSREP: wsrep: initiating TOI for write set (58)
2019-06-20T08:27:44.725167Z 45 [Note] WSREP: wsrep: completed TOI write set (58)
2019-06-20T08:27:44.725207Z 45 [Note] WSREP: Setting WSREPXid (InnoDB): 17379cf7-932a-11e9-8ca8-eafccc6e901e:58
2019-06-20T08:27:44.742909Z 45 [Note] WSREP: Completed query (REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!999
97*/ 'test', 't', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, convert(`b` using utf8mb4), CONCAT(ISNULL(`b`)))) AS UNSIGNED)), 10, 16)),
 0) AS crc FROM `test`.`t`) replication with write-set (58) and exec_mode: TOTAL_ORDER in TO Isolation mode
2019-06-20T08:27:44.750033Z 45 [Note] WSREP: wsrep: replicating commit (-1)
2019-06-20T08:27:44.750093Z 45 [Warning] WSREP: SQL statement ((null)) was not replicated (thd: 45)
2019-06-20T08:27:44.750107Z 45 [Note] WSREP: commit action failed for reason: WSREP_TRX_FAIL THD: 45 Query: (null)
2019-06-20T08:27:44.750116Z 45 [Note] WSREP: conflict state: NO_CONFLICT
2019-06-20T08:27:44.750124Z 45 [Note] WSREP: --------- CONFLICT DETECTED --------
2019-06-20T08:27:44.750132Z 45 [Note] WSREP: cluster conflict due to certification failure for threads:

2019-06-20T08:27:44.750142Z 45 [Note] WSREP: Victim thread:
  THD: 45, mode: local, state: executing, conflict: cert failure, seqno: -1
  SQL: (null)

2019-06-20T08:27:44.750311Z 45 [Note] WSREP: cleanup transaction for LOCAL_STATE: (null)
2019-06-20T08:27:44.750342Z 45 [Warning] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'Deadlock found when trying to get lock; try restarting transaction',
Error_code: 1213
2019-06-20T08:27:44.750356Z 45 [Note] WSREP: Apply Event failed (Reason: 1, Conflict-State: CERT_FAILURE)
2019-06-20T08:27:44.750375Z 45 [ERROR] Slave SQL for channel '': Node has dropped from cluster, Error_code: 1047
2019-06-20T08:27:44.750388Z 45 [Note] Slave SQL thread for channel '' exiting, replication stopped in log 'mysql-bin.000004' at position 13866

Regardless to the message about "Node has dropped from cluster" the node state after the error is still primary/synced

The error still happens even if I stop all nodes except for slave one on the second cluster.

Environment

None

AFFECTED CS IDs

257377

Attachments

test-single.sh

11 Jul, 2019

test.sh

11 Jul, 2019

nickolays_alternative_consistency_check_method.txt

09 Jul, 2019

Linked work items

is blocked by

PXC-4113

Inconsistent behavior of SET SESSION binlog_format='STATEMENT';

Activity

Show:

Julia Vural

March 4, 2025 at 9:28 PM

It appears that this issue is no longer being worked on, so we are closing it for housekeeping purposes. If you believe the issue still exists, please open a new ticket after confirming it's present in the latest release.

Kamil Holubicki

January 24, 2023 at 3:40 PM

Thanks, so we need:

Fix https://perconadev.atlassian.net/browse/PXC-4113#icft=PXC-4113
Document

Sveta Smirnova

January 24, 2023 at 12:43 PM

@Kamil Holubicki

Percona Toolkit uses different format:

mysql> set @@binlog_format:='statement';
Query OK, 0 rows affected (0,00 sec)

See also https://jira.percona.com/browse/PXC-1144 and https://jira.percona.com/browse/PXC-4113. I believe we need clear solution for https://jira.percona.com/browse/PXC-4113

Kamil Holubicki

January 24, 2023 at 10:12 AM

Both 5.7 and 8.0 reject binlog format STATEMENT

5.7

mysql> SET SESSION binlog_format=STATEMENT;
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER

8.0

mysql> SET SESSION binlog_format=STATEMENT;
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER

@Sveta Smirnova @Nickolay Ihalainen

Is there anything else here to be done?

Sveta Smirnova

September 25, 2019 at 1:04 AM

As agreed on the internal discussion we need to reject STATEMENT events on PXC level and clearly describe this limitation in the pt-table-checksum user manual.

Resize issue view side panel

Won't Do

Details

Assignee

Unassigned

Reporter

Nickolay Ihalainen(Deactivated)

Time tracking

2d 5h logged

Affects versions

5.7.25-31.35

8.0.30-22 (Q3 2022)

Priority

Medium

Created June 20, 2019 at 8:45 AM

Updated March 4, 2025 at 9:28 PM

Resolved March 4, 2025 at 9:28 PM