Issues

Select view

Select search mode

 
50 of

Galera support in pt-table-checksum

Description

According to the documentation, Galera cluster is supported. I think this requires some clarification as to me it seems that this is not correct.

At 2 different customers, one running Percona Server 8.0.41-32.1, the other running MariaDB 10.6, I experienced that when running pt-table-checksum on a Galera cluster with a replica that the cluster would hang, and the entire cluster was stuck until the node where pt-table-checksum is running is killed -9.

With both occurrences, I executed this on a reader node. On the writer node I saw pt-table-checksum chunks running indefinitely and other transactions on that same table waiting for innodb locks.

I intended to check the replica, so I did not check if the checksum table on the other cluster node was useful.

The documentation states that the tool is supported. It does not state that the tool is supposed to primarily run on a specific node. This report is to share that executing it on a reader node does not work. If this is the case, and the tool only works on the writer, then this should be taken into account:

  • The cluster cannot have DML on the table(s) that are being checked on the cluster nodes where the tool is replicating to

  • The tool cannot work on round-robin clusters

  • When using different cluster nodes on

If the situation that I am describing is supported, then there is a bug already present for a few years. As said, I did not test the tool on the writer node.

In any case, I think the documentation should be updated in the section about Galera clusters where the replica of the cluster is described, and that the only safe way to execute it is to disable galera replication with --set-vars=”wsrep_on=0” as parameter.

Thank you for taking this issue in consideration,

Michaël

Environment

None

Details

Assignee

Reporter

Priority

Components

Affects versions

Needs QA

Yes

Smart Checklist

Created 9 hours ago
Updated 9 hours ago

Activity