LP #1298689: Possible bug in the in MySQL + Galera and PXC when running PT-table-checksum on a slave.

Description

**Reported in Launchpad by Marco Tusa last update 17-06-2014 10:30:49

Current nodes setup

node1 <> node2 <> node3

 

\/
Slave

The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data on the nodes, are replicated with ROW replication.
To note that I am testing on percona toolkit 2.2.7 and the version of pt-table-checksum is implementing the /!99997/ trick as describe at https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ.

From MySQL point of view I have tested with PXC 5.5.34 and Codership 5.5.43 (wsrep 25.9)

When pt-table-checksum is executed both the servers print the DEBUG information,
140327 15:38:48 [Note] WSREP: consistency check: REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc)
SELECT /!99997/ 'sakila', 'staff', '1', NULL, NULL, NULL, COUNT AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `staff_id`, `first_name`, `last_name`, `address_id`, `picture`, `email`,
`store_id`, `active`, `username`, `password`, `last_update` + 0, CONCAT(ISNULL(`picture`), ISNULL(`email`), ISNULL(`password`))))
AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`staff` /checksum table/

But then the information is passed as ROW format as UPDATE:
BINLOG '
SH40UxMjAAAATAAAAA2SAAAAACQAAAAAAAEAB3BlcmNvbmEACWNoZWNrc3VtcwAM/v4DBA/8/P4D
/gMHDf7A/sAEWAICAv54/nh4Bg==
SH40UxgjAAAAeQAAAIaSAAAAACQAAAAAAAEADP////949gZzYWtpbGEFc3RhZmYBAAAACDIzMzY2
OGFlAgAAAEh+NFNw8AZzYWtpbGEFc3RhZmYBAAAANzMqPggyMzM2NjhhZQIAAAAIMjMzNjY4YWUC
AAAASH40Uw==
'/!/;

  1.  

    1.  

      1. UPDATE percona.checksums
        ### WHERE
        ### @1='sakila' /* STRING(192) meta=65216 nullable=0 is_null=0 */
        ### @2='staff' /* STRING(192) meta=65216 nullable=0 is_null=0 */
        ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
        ### @4=NULL /* INT meta=4 nullable=1 is_null=1 */
        ### @5=NULL /* INT meta=600 nullable=1 is_null=1 */
        ### @6=NULL /* INT meta=2 nullable=1 is_null=1 */
        ### @7=NULL /* INT meta=2 nullable=1 is_null=1 */
        ### @8='233668ae' /* STRING(120) meta=65144 nullable=0 is_null=0 */
        ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
        ### @10=NULL /* INT meta=65144 nullable=1 is_null=1 */
        ### @11=NULL /* INT meta=0 nullable=1 is_null=1 */
        ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
        ### SET
        ### @1='sakila' /* STRING(192) meta=65216 nullable=0 is_null=0 */
        ### @2='staff' /* STRING(192) meta=65216 nullable=0 is_null=0 */
        ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
        ### @4=0.166211 /* FLOAT meta=4 nullable=1 is_null=0 */
        ### @5=NULL /* FLOAT meta=600 nullable=1 is_null=1 */
        ### @6=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
        ### @7=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
        ### @8='233668ae' /* STRING(120) meta=65144 nullable=0 is_null=0 */
        ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
        ### @10='233668ae' /* STRING(120) meta=65144 nullable=1 is_null=0 */
        ### @11=2 /* INT meta=0 nullable=1 is_null=0 */
        ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
        # at 37510

Given that there is no checksum execution on the SLAVE.

pt-table-checksum code related to the patch

  1. ########################################################################

  2. Checksum args and the DMS part of the checksum query for each table.

  3. ########################################################################
    my %crc_args = $rc->get_crc_args(dbh => $master_dbh);
    my $checksum_dml = "REPLACE INTO $repl_table "
    . "(db, tbl, chunk, chunk_index,"
    . " lower_boundary, upper_boundary, this_cnt, this_crc) "
    . "SELECT"
    . ($cluster->is_cluster_node($master_cxn) ? ' /!99997/' : '')
    . " ?, ?, ?, ?, ?, ?,";
    my $past_cols = " COUNT, '0'";

HOW to replicate:
================================
1) install a cluster using either PXC or MySQL/Galera both(5.5)
2) connect another MySQL (simple) as slave to node1
3) load a test schema
4) remove some records on the SLAVE
5) run pt-table-checksum as /pt-table-checksum -u<user> -p<pw> -h 192.168.0.35 -P3306 --databases sakila,test --recursion-method dsn=h=192.168.0.35,P=3306,u=<user>,p=<pw>,D=percona,t=dsns --no-check-binlog-format
dsns table:
(root@localhost:pm) [percona]>select * from dsns;


--------------------------------------------------

id

parent_id

dsn


--------------------------------------------------

6

NULL

h=192.168.0.35,P=5510,u=stress,p=tool


--------------------------------------------------
1 row in set (0.00 sec)
6) review data from the pt-table-checksum and from the query on master/slave
SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT AS chunks
FROM percona.checksums
WHERE (
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;

Environment

None

Smart Checklist

Activity

Lalit Choudhary April 12, 2019 at 1:55 PM
Edited

Hi Marco,

Thank you for the report.

There are lot of changes happen from reported version to the latest version of PXC and Percona Toolkit.

please let us know if see the same issue in the latest version. 

lpjirasync January 24, 2018 at 8:12 PM

**Comment from Launchpad by: Raghavendra D Prabhu on: 17-06-2014 10:29:18

On 30 Mar 2014 01:15, "Marco Tusa" <1298689@bugs.launchpad.net> wrote:
>
> Ragu,
> I know very well that STATEMENT is not the official replication way for
>

>
>
> But this has NOTHING to do with that trick, but with the fact that
> galera is not enforcing the ROW format at session level.

This has been fixed in 5.6 and a warning is emitted in 5.5 (no error here
for backward compatibility).

>
> As such the pt-table-checksum is able to change it (Line 9097)
> if ( $original_binlog_format !~ /STATEMENT/i ) {
> $sql = q{/!50108 SET @@binlog_format := 'STATEMENT'/};
> eval {
> PTDEBUG && _d($dbh, $sql);
> $dbh->do($sql);
> };
>
>
> At the same time if I have other activity running on the other node ...
> or on this Master node, this activity is reported in ROW format:
>
> root@localhost:pm) [sakila]>insert into payment3 select * from payment2;
> #140329 15:18:19 server id 3 end_log_pos 4991358 Write_rows: table id
> 57 flags: STMT_END_F
> ### INSERT INTO sakila.payment3
> ### SET
> ### @1=1 /* SHORTINT meta=0 nullable=1 is_null=0 */
> ### @2=1 /* SHORTINT meta=0 nullable=0 is_null=0 */
> ### @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */
> ### @4=76 /* INT meta=0 nullable=1 is_null=0 */
> ### @5=000000002.990000000 /* DECIMAL(5,2) meta=1282 nullable=0
> is_null=0 */
> ### @6=2005-05-25 11:30:37 /* DATETIME meta=0 nullable=0 is_null=0 */
> ### @7=1140059550 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
> ### INSERT INTO sakila.payment3
>
>
> Given all the above, it seems that WE CAN use pt-table-checksum, but I
> am unsure about the GALERA stability, and WHY then we have that trick.
>
> The parameter wsrep_forced_binlog_format was introduced in Galera on the
> 2011-11-08 (Seppo comment
> https://bugs.launchpad.net/codership-mysql/+bug/887564).
> The trick seems from 4/24/12 Alex comment
> (
https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ
)
>
> SO it comes AFTER, the wsrep_forced_binlog_format, and as such I am tent
> to assume it was introduced because there was a problem on the use of
> wsrep_forced_binlog_format...
>
> So after all this I think we can summarize my question as:
>
> "If we have to use the pt-table-checksum, will be the usage of
> wsrep_forced_binlog_format=NONE to be consider SAFE and CONSISTENT for
> galera replication, OR we have to assume that this parameter should
> NEVER be set/modify?
>
> In the case it cannot be used, what is the way then to use
> pt-table-checksum.
> In any case what is the reason of having the /!99997/ trick?
> "
> Should be consider all this as a documentation BUG or an dangerous
> exposed feature or a real bug?
>
>
> ------------------------------------------------ > Marco Tusa,
> Mobile: +1-613-282-7337
> do something for your planet!
> http://www.tusacentral.net/
> ------------------------------------------------ > Save a tree... please don't print this e-mail unless you really need to
>
> On 29/03/2014 03:27, Raghavendra D Prabhu wrote:
> >> The issue is that when running pt-table-checksum on node1 all the
statement that pt-table-checksum execute to test the data
> > > on the nodes, are replicated with ROW replication.
> >
> > This is because STATEMENT binlog format is not supported in
> > wsrep/galera. pt-checksum possibly needs fixes for PXC in this regard.
> >
>
> –
> You received this bug notification because you are subscribed to MySQL
> patches by Codership.
> Matching subscriptions: wsrep
> https://bugs.launchpad.net/bugs/1298689
>
> Title:
> Possible bug in the in MySQL + Galera and PXC when running PT-table-
> checksum on a slave.
>
> Status in MySQL patches by Codership:
> New
>
> Bug description:
> Current nodes setup
>
> node1 <> node2 <> node3
> |
> \/
> Slave
>
> The issue is that when running pt-table-checksum on node1 all the
statement that pt-table-checksum execute to test the data on the nodes, are
replicated with ROW replication.
> To note that I am testing on percona toolkit 2.2.7 and the version of
pt-table-checksum is implementing the /!99997/ trick as describe at
https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ
.
>
> From MySQL point of view I have tested with PXC 5.5.34 and Codership
5.5.43 (wsrep 25.9)
>
> When pt-table-checksum is executed both the servers print the DEBUG
information,
> 140327 15:38:48 [Note] WSREP: consistency check: REPLACE INTO
`percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary,
upper_boundary, this_cnt, this_crc)
> SELECT /!99997/ 'sakila', 'staff', '1', NULL, NULL, NULL, COUNT
AS cnt,
> COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `staff_id`,
`first_name`, `last_name`, `address_id`, `picture`, `email`,
> `store_id`, `active`, `username`, `password`, `last_update` + 0,
CONCAT(ISNULL(`picture`), ISNULL(`email`), ISNULL(`password`))))
> AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`staff` /*checksum
table*/
>
> But then the information is passed as ROW format as UPDATE:
> BINLOG '
>
SH40UxMjAAAATAAAAA2SAAAAACQAAAAAAAEAB3BlcmNvbmEACWNoZWNrc3VtcwAM/v4DBA/8/P4D
> /gMHDf7A/sAEWAICAv54/nh4Bg==
>
SH40UxgjAAAAeQAAAIaSAAAAACQAAAAAAAEADP////949gZzYWtpbGEFc3RhZmYBAAAACDIzMzY2
>
OGFlAgAAAEh+NFNw8AZzYWtpbGEFc3RhZmYBAAAANzMqPggyMzM2NjhhZQIAAAAIMjMzNjY4YWUC
> AAAASH40Uw==
> '/!/;
> ### UPDATE percona.checksums
> ### WHERE
> ### @1='sakila' /* STRING(192) meta=65216 nullable=0
is_null=0 */
> ### @2='staff' /* STRING(192) meta=65216 nullable=0
is_null=0 */
> ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
> ### @4=NULL /* INT meta=4 nullable=1 is_null=1 */
> ### @5=NULL /* INT meta=600 nullable=1 is_null=1 */
> ### @6=NULL /* INT meta=2 nullable=1 is_null=1 */
> ### @7=NULL /* INT meta=2 nullable=1 is_null=1 */
> ### @8='233668ae' /* STRING(120) meta=65144 nullable=0
is_null=0 */
> ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
> ### @10=NULL /* INT meta=65144 nullable=1 is_null=1 */
> ### @11=NULL /* INT meta=0 nullable=1 is_null=1 */
> ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0
is_null=0 */
> ### SET
> ### @1='sakila' /* STRING(192) meta=65216 nullable=0
is_null=0 */
> ### @2='staff' /* STRING(192) meta=65216 nullable=0
is_null=0 */
> ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
> ### @4=0.166211 /* FLOAT meta=4 nullable=1
is_null=0 */
> ### @5=NULL /* FLOAT meta=600 nullable=1 is_null=1 */
> ### @6=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
> ### @7=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
> ### @8='233668ae' /* STRING(120) meta=65144 nullable=0
is_null=0 */
> ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
> ### @10='233668ae' /* STRING(120) meta=65144 nullable=1
is_null=0 */
> ### @11=2 /* INT meta=0 nullable=1 is_null=0 */
> ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0
is_null=0 */
> # at 37510
>
> Given that there is no checksum execution on the SLAVE.
>
>
> pt-table-checksum code related to the patch
> #
########################################################################
> # Checksum args and the DMS part of the checksum query for each
table.
> #
########################################################################
> my %crc_args = $rc->get_crc_args(dbh => $master_dbh);
> my $checksum_dml = "REPLACE INTO $repl_table "
> . "(db, tbl, chunk, chunk_index,"
> . " lower_boundary, upper_boundary, this_cnt,
this_crc) "
> . "SELECT"
> . ($cluster->is_cluster_node($master_cxn) ? '
/!99997/' : '')
> . " ?, ?, ?, ?, ?, ?,";
> my $past_cols = " COUNT, '0'";
>
>
>
>
>
> HOW to replicate:
> ================================
> 1) install a cluster using either PXC or MySQL/Galera both(5.5)
> 2) connect another MySQL (simple) as slave to node1
> 3) load a test schema
> 4) remove some records on the SLAVE
> 5) run pt-table-checksum as /pt-table-checksum -u<user> -p<pw> -h
192.168.0.35 -P3306 --databases sakila,test --recursion-method
dsn=h=192.168.0.35,P=3306,u=<user>,p=<pw>,D=percona,t=dsns
--no-check-binlog-format
> dsns table:
> (root@localhost:pm) [percona]>select * from dsns;
>


--------------------------------------------------
> | id | parent_id | dsn |
>


--------------------------------------------------
> | 6 | NULL | h=192.168.0.35,P=5510,u=stress,p=tool |
>


--------------------------------------------------
> 1 row in set (0.00 sec)
> 6) review data from the pt-table-checksum and from the query on
master/slave
> SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT AS
chunks
> FROM percona.checksums
> WHERE (
> master_cnt <> this_cnt
> OR master_crc <> this_crc
> OR ISNULL(master_crc) <> ISNULL(this_crc))
> GROUP BY db, tbl;
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/codership-mysql/+bug/1298689/+subscriptions

lpjirasync January 24, 2018 at 8:12 PM

**Comment from Launchpad by: Marco Tusa on: 29-03-2014 19:40:30

Ragu,
I know very well that STATEMENT is not the official replication way for
galera.

But this is exactly what I am trying to underline, there is an
inconsistency from my point of view.

Let me explain the whole in more details.

If you check the link in the bug report you will see that there was a
"trick" design for the pt-table-checksum, trick that still there given
it report the info in the log when use.
This trick DOES not work when wsrep_forced_binlog_format=ROW.

Given it force the log to stay as ROW whatever instruction.

If we change wsrep_forced_binlog_format=STATEMENT, obviously it works,
but given the incompatibility with the GALERA replication you have 99%
that the node will crash with:
140328 22:43:41 [Warning] WSREP: BF applier failed to
open_and_lock_tables: 1666, fatal: 0 wsrep = (exec_mode: 1
conflict_state: 0 seqno: 291772803)
140328 22:43:41 [ERROR] Slave SQL: Error executing row event: 'Cannot
execute statement: impossible to write to binary log since statement is
in row format and BINLOG_FORMAT = STATEMENT.', Error_code: 1666
140328 22:43:41 [Warning] WSREP: RBR event 2 Update_rows apply warning:
1666, 291772803
140328 22:43:41 [ERROR] WSREP: Failed to apply trx: source:
1f284b88-7401-11e3-afd2-76df52ec0065 version: 2 local: 0 state: APPLYING
flags: 1 conn_id: 37912472 trx_id: 828722030 seqnos (l: 92316516, g:
291772803, s: 291772802, d: 291772802, ts: 1396064621012184336)
140328 22:43:41 [ERROR] WSREP: Failed to apply trx 291772803 10 times
140328 22:43:41 [ERROR] WSREP: Node consistency compromized, aborting...
140328 22:43:41 [Note] WSREP: Closing send monitor...

Testing wsrep_forced_binlog_format=NONE ...
pt-table-checksum is workign fine, the binary log show that the
statement are logged in STATEMENT mode:
#140329 15:03:52 server id 35 end_log_pos 1798 Query
thread_id=44936 exec_time=0 error_code=0
use `sakila`/!/;
SET TIMESTAMP=1396119832/!/;
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index,
lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /!99997/
'sakila', 'actor', '1', NULL, NULL, NULL, COUNT AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`,
`first_name`, `last_name`, `last_update` + 0)) AS UNSIGNED)), 10, 16)),
0) AS crc FROM `sakila`.`actor` /checksum table/
/!/;

  1. at 1798
    #140329 15:03:52 server id 35 end_log_pos 1825 Xid = 306
    COMMIT/!/;

But this has NOTHING to do with that trick, but with the fact that

galera is not enforcing the ROW format at session level.

As such the pt-table-checksum is able to change it (Line 9097)
if ( $original_binlog_format !~ /STATEMENT/i ) {
$sql = q{/!50108 SET @@binlog_format := 'STATEMENT'/};
eval {
PTDEBUG && _d($dbh, $sql);
$dbh->do($sql);
};

At the same time if I have other activity running on the other node ...
or on this Master node, this activity is reported in ROW format:

root@localhost:pm) [sakila]>insert into payment3 select * from payment2;
#140329 15:18:19 server id 3 end_log_pos 4991358 Write_rows: table id
57 flags: STMT_END_F

  1.  

    1.  

      1. INSERT INTO sakila.payment3

      2. SET

      3. @1=1 /* SHORTINT meta=0 nullable=1 is_null=0 */

      4. @2=1 /* SHORTINT meta=0 nullable=0 is_null=0 */

      5. @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */

      6. @4=76 /* INT meta=0 nullable=1 is_null=0 */

      7. @5=000000002.990000000 /* DECIMAL(5,2) meta=1282 nullable=0
        is_null=0 */

      8. @6=2005-05-25 11:30:37 /* DATETIME meta=0 nullable=0 is_null=0 */

      9. @7=1140059550 /* TIMESTAMP meta=0 nullable=0 is_null=0 */

      10. INSERT INTO sakila.payment3

Given all the above, it seems that WE CAN use pt-table-checksum, but I
am unsure about the GALERA stability, and WHY then we have that trick.

The parameter wsrep_forced_binlog_format was introduced in Galera on the
2011-11-08 (Seppo comment
https://bugs.launchpad.net/codership-mysql/+bug/887564).
The trick seems from 4/24/12 Alex comment
(https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ)

SO it comes AFTER, the wsrep_forced_binlog_format, and as such I am tent
to assume it was introduced because there was a problem on the use of
wsrep_forced_binlog_format...

So after all this I think we can summarize my question as:

"If we have to use the pt-table-checksum, will be the usage of
wsrep_forced_binlog_format=NONE to be consider SAFE and CONSISTENT for
galera replication, OR we have to assume that this parameter should
NEVER be set/modify?

In the case it cannot be used, what is the way then to use
pt-table-checksum.
In any case what is the reason of having the /!99997/ trick?
"
Should be consider all this as a documentation BUG or an dangerous
exposed feature or a real bug?

------------------------------------------------ Marco Tusa,
Mobile: +1-613-282-7337
do something for your planet!
http://www.tusacentral.net/
------------------------------------------------ Save a tree... please don't print this e-mail unless you really need to

On 29/03/2014 03:27, Raghavendra D Prabhu wrote:
>> The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data
> > on the nodes, are replicated with ROW replication.
>
> This is because STATEMENT binlog format is not supported in
> wsrep/galera. pt-checksum possibly needs fixes for PXC in this regard.
>

lpjirasync January 24, 2018 at 8:12 PM

**Comment from Launchpad by: Raghavendra D Prabhu on: 29-03-2014 07:27:59

> The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data
> on the nodes, are replicated with ROW replication.

This is because STATEMENT binlog format is not supported in wsrep/galera. pt-checksum possibly needs fixes for PXC in this regard.

lpjirasync January 24, 2018 at 8:12 PM

**Comment from Launchpad by: Marco Tusa on: 29-03-2014 00:01:50

Forgot to mention that changing:
(root@localhost:pm) [percona]>set global wsrep_forced_binlog_format='STATEMENT';
Query OK, 0 rows affected (0.00 sec)

It obviously work, but this mean changing a global value and all the operations are then write in statement format

Cannot Reproduce

Details

Assignee

Reporter

Priority

Smart Checklist

Created January 24, 2018 at 8:11 PM
Updated March 4, 2024 at 4:46 PM
Resolved April 12, 2019 at 1:56 PM