pt-table-sync is not attempting to sync data between two tables even though there are row differences.

Description

pt-table-sync --version
pt-table-sync 3.0.13

I have two tables with differences, but pt-table-sync is not recognizing the differences when it attempts to sync.

I attached a database dump of the tables, percona_sync_test.sql, with two tables sr_versions_editions_src and sr_versions_editions_dst. You can see the differences in the tables with these query (preferredAssembly has changed):

 

mysql> SELECT musicID, editionID, versionID, preferredAssembly FROM sr_versions_editions_src WHERE musicID = 30652 AND editionID IN ('12204AY', '30141069'); +---------+-----------+-----------+-------------------+ | musicID | editionID | versionID | preferredAssembly | +---------+-----------+-----------+-------------------+ | 30652 | 12204AY | 114916 | 0 | | 30652 | 12204AY | 142051 | 0 | | 30652 | 30141069 | 201819 | 0 | | 30652 | 30141069 | 201821 | 1 | +---------+-----------+-----------+-------------------+ mysql> SELECT musicID, editionID, versionID, preferredAssembly FROM sr_versions_editions_dst WHERE musicID = 30652 AND editionID IN ('12204AY', '30141069'); +---------+-----------+-----------+-------------------+ | musicID | editionID | versionID | preferredAssembly | +---------+-----------+-----------+-------------------+ | 30652 | 12204AY | 114916 | 1 | | 30652 | 12204AY | 142051 | 0 | | 30652 | 30141069 | 201819 | 0 | | 30652 | 30141069 | 201821 | 0 | +---------+-----------+-----------+-------------------+

When running the sync tool it does not detect the changes:

 

[xxx@localhost ~]$ pt-table-sync --print --verbose --verbose --no-check-triggers h=localhost,D=percona_sync_test,t=sr_versions_editions_src h=localhost,D=percona_sync_test,t=sr_versions_editions_dst # A software update is available: # Syncing D=percona_sync_test,h=dragon,t=sr_versions_editions_dst # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE # SELECT /*percona_sync_test.sr_versions_editions_dst:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_dst` FORCE INDEX (`music_edition_version`) WHERE (1=1) FOR UPDATE # 0 0 0 0 Nibble 16:04:02 16:04:02 0 percona_sync_test.sr_versions_editions_src

If I run the sql printed out I can see the hashes are the same. I attached a full log file with PTDEBUG enabled. Interestingly, if I go into the code and manually force it to not use the BIT_XOR algorithm by changing line 4982 from:

if ( $algorithm eq 'BIT_XOR' ) {

to:

if ( $algorithm ne 'BIT_XOR' ) {

The old checksum algorithm detects the changes and runs the updates.

[xxx@localhost ~]$ ./pt-table-sync --print --verbose --verbose --no-check-triggers h=dragon,D=percona_sync_test,t=sr_versions_editions_src h=dragon,D=percona_sync_test,t=sr_versions_editions_dst # A software update is available: # Syncing D=percona_sync_test,h=dragon,t=sr_versions_editions_dst # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE # SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(RIGHT(MAX(@crc := CONCAT(LPAD(@cnt := @cnt + 1, 16, '0'), CONV(CAST(CRC32(CONCAT(@crc, CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))))) AS UNSIGNED), 10, 16))), 16), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE # SELECT /*percona_sync_test.sr_versions_editions_dst:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(RIGHT(MAX(@crc := CONCAT(LPAD(@cnt := @cnt + 1, 16, '0'), CONV(CAST(CRC32(CONCAT(@crc, CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))))) AS UNSIGNED), 10, 16))), 16), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_dst` FORCE INDEX (`music_edition_version`) WHERE (1=1) FOR UPDATE # SELECT /*rows in nibble*/ `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS __crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) ORDER BY `musicid`, `editionid`, `versionid` LOCK IN SHARE MODE # SELECT /*rows in nibble*/ `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS __crc FROM `percona_sync_test`.`sr_versions_editions_dst` FORCE INDEX (`music_edition_version`) WHERE (1=1) ORDER BY `musicid`, `editionid`, `versionid` FOR UPDATE UPDATE `percona_sync_test`.`sr_versions_editions_dst` SET `versiontype`='Assembly (Unison)', `pdffilename`='12204AY_ea01397_110618.pdf', `pdfdate`='0000-00-00 00:00:00', `revisionid`='AB', `rec_playlistid`='', `rec_selecteditemid`='', `eperms`='yes', `editionspecifictitle`='Fly Like a Bird', `songid`='83104', `kp_versiontoedition`='197509', `pdf_status`='Complete', `isrc`='', `expandeddescription`='', `preferredassembly`='0' WHERE `musicid`='30652' AND `editionid`='12204AY' AND `versionid`='114916' LIMIT 1 /*percona-toolkit src_db:percona_sync_test src_tbl:sr_versions_editions_src src_dsn:D=percona_sync_test,h=dragon,t=sr_versions_editions_src dst_db:percona_sync_test dst_tbl:sr_versions_editions_dst dst_dsn:D=percona_sync_test,h=dragon,t=sr_versions_editions_dst lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:117898 user:coreym host:fig.ocp.org*/; UPDATE `percona_sync_test`.`sr_versions_editions_dst` SET `versiontype`='Assembly (Unison)', `pdffilename`='30141069_XA01397-001_113628.pdf', `pdfdate`='0000-00-00 00:00:00', `revisionid`='AM', `rec_playlistid`='', `rec_selecteditemid`='', `eperms`='no', `editionspecifictitle`='Fly Like a Bird/Suba Mi Alma', `songid`='83104', `kp_versiontoedition`='589971', `pdf_status`='Complete', `isrc`='', `expandeddescription`='', `preferredassembly`='1' WHERE `musicid`='30652' AND `editionid`='30141069' AND `versionid`='201821' LIMIT 1 /*percona-toolkit src_db:percona_sync_test src_tbl:sr_versions_editions_src src_dsn:D=percona_sync_test,h=dragon,t=sr_versions_editions_src dst_db:percona_sync_test dst_tbl:sr_versions_editions_dst dst_dsn:D=percona_sync_test,h=dragon,t=sr_versions_editions_dst lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:117898 user:coreym host:fig.ocp.org*/; # 0 0 0 2 Nibble 16:11:21 16:11:21 2 percona_sync_test.sr_versions_editions_src

If I run the sql printed there I can see different hashes.

Environment

CentOS Linux release 7.6.1810 (Core)

Mysql version:

Server version: 5.6.21-log MySQL Community Server (GPL)

pt-table-sync --version
pt-table-sync 3.0.13

 

perl --version

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 39 registered patches, see perl -V for more detail)

Attachments

2
  • 21 Mar 2019, 11:17 PM
  • 21 Mar 2019, 11:16 PM

Smart Checklist

Activity

Show:

Fredric Johansson November 21, 2021 at 9:21 AM

I hit the same issue when attempting to sync a table which only had differences in casing in a varchar. As long as there was even number of differences in a position i.e. ('s'),('s') vs ('S'),('S') in the same chunk, p-t-s wont detect it. It is clearly not the same data but the use of BIT_XOR() makes p-t-s think it is, and it will skip the chunk

Rich Vigorito August 27, 2020 at 9:36 PM

Was curious if there was a plan to ever address this?

Hans Bull August 12, 2019 at 4:51 AM

Lalit Choudhary April 15, 2019 at 3:00 PM

Hi

Thank you for the report.

validated with pt-table-sync 3.0.13 and PS 5.6.43

Corey Mohler March 22, 2019 at 9:20 PM

You might be on to something Richard, because changing the hashing algorithm from CRC32 to MD5 will show a difference in the hash:

 

mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.01 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_dst:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_dst` FORCE INDEX (`music_edition_version`) WHERE (1=1) FOR UPDATE -> ; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.00 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(MD5(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+---------------+ | chunk_num | cnt | crc | +-----------+-----+---------------+ | 0 | 288 | 30aeb20d89271 | +-----------+-----+---------------+ 1 row in set, 288 warnings (0.00 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_dst:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(MD5(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_dst` FORCE INDEX (`music_edition_version`) WHERE (1=1) FOR UPDATE -> ; +-----------+-----+---------------+ | chunk_num | cnt | crc | +-----------+-----+---------------+ | 0 | 288 | 30aeb20d89224 | +-----------+-----+---------------+ 1 row in set, 288 warnings (0.00 sec)

And testing further on how the it works with the CRC32 algorithm, it seems like it doesn't matter where the "preferredAssembly" column is set to '1', as long as it is once and only once. I can set it to any row and the same hash will result:

 

mysql> update sr_versions_editions_src SET preferredAssembly = '0'; Query OK, 0 rows affected (0.01 sec) Rows matched: 288 Changed: 0 Warnings: 0 mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID = '30128782' limit 1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.00 sec) mysql> update sr_versions_editions_src SET preferredAssembly = '0'; Query OK, 1 row affected (0.02 sec) Rows matched: 288 Changed: 1 Warnings: 0 mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID = '30115263' limit 1; Query OK, 1 row affected (0.02 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.00 sec) mysql> update sr_versions_editions_src SET preferredAssembly = '0'; Query OK, 1 row affected (0.00 sec) Rows matched: 288 Changed: 1 Warnings: 0 mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID = 'MATRIX2018' limit 1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.00 sec)

 

In fact it seems be even worse, I can set THREE flags to preferredAssembly at random and get the same hash:

 

mysql> update sr_versions_editions_src SET preferredAssembly = '0'; Query OK, 1 row affected (0.01 sec) Rows matched: 288 Changed: 1 Warnings: 0 mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID IN ('30133832', 'MATRIX2011', 'TM-151') LIMIT 3; Query OK, 3 rows affected (0.01 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> select editionID, musicID, versionID, preferredAssembly from sr_versions_editions_src WHERE preferredAssembly = '1'; +------------+---------+-----------+-------------------+ | editionID | musicID | versionID | preferredAssembly | +------------+---------+-----------+-------------------+ | 30133832 | 30652 | 136469 | 1 | | MATRIX2011 | 30652 | 136469 | 1 | | TM-151 | 30652 | 114916 | 1 | +------------+---------+-----------+-------------------+ 3 rows in set (0.01 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e3a341be | +-----------+-----+----------+ 1 row in set (0.00 sec)

 

But if I set only two I get a different hash:

mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID IN ('30133832', 'MATRIX2011', 'TM-151') LIMIT 2; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> select editionID, musicID, versionID, preferredAssembly from sr_versions_editions_src WHERE preferredAssembly = '1'; +------------+---------+-----------+-------------------+ | editionID | musicID | versionID | preferredAssembly | +------------+---------+-----------+-------------------+ | 30133832 | 30652 | 136469 | 1 | | MATRIX2011 | 30652 | 136469 | 1 | +------------+---------+-----------+-------------------+ 2 rows in set (0.00 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e2612b89 | +-----------+-----+----------+ 1 row in set (0.00 sec)

And setting FOUR at random gets the same result as setting TWO:

 

mysql> update sr_versions_editions_src SET preferredAssembly = '0'; Query OK, 2 rows affected (0.00 sec) Rows matched: 288 Changed: 2 Warnings: 0 mysql> update sr_versions_editions_src SET preferredAssembly = '1' WHERE editionID IN ('MATRIX2020', 'CC-111', 'BB-111', '30137299') LIMIT 4; Query OK, 4 rows affected (0.00 sec) Rows matched: 4 Changed: 4 Warnings: 0 mysql> select editionID, musicID, versionID, preferredAssembly from sr_versions_editions_src WHERE preferredAssembly = '1'; +------------+---------+-----------+-------------------+ | editionID | musicID | versionID | preferredAssembly | +------------+---------+-----------+-------------------+ | 30137299 | 30652 | 136469 | 1 | | BB-111 | 30652 | 114916 | 1 | | CC-111 | 30652 | 114916 | 1 | | MATRIX2020 | 30652 | 136469 | 1 | +------------+---------+-----------+-------------------+ 4 rows in set (0.00 sec) mysql> SELECT /*percona_sync_test.sr_versions_editions_src:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `editionid`, `musicid`, `versionid`, `versiontype`, `pdffilename`, `pdfdate`, `revisionid`, `rec_playlistid`, `rec_selecteditemid`, `eperms`, `editionspecifictitle`, `songid`, `kp_versiontoedition`, `pdf_status`, `isrc`, `expandeddescription`, `preferredassembly`, CONCAT(ISNULL(`eperms`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `percona_sync_test`.`sr_versions_editions_src` FORCE INDEX (`music_edition_version`) WHERE (1=1) LOCK IN SHARE MODE; +-----------+-----+----------+ | chunk_num | cnt | crc | +-----------+-----+----------+ | 0 | 288 | e2612b89 | +-----------+-----+----------+ 1 row in set (0.00 sec)

 

This is probably because the CRC hashing algorithm doesn't produce "random" results, so symmetrical changes to the data result in symmetrical changes to the hashes, unlike cryptographic hash functions like MD5 where the data has no relationship to the resulting hash, so when you do a BIT_XOR on the entire group of CRC32 hashes of the data, if data changed in one row but also changed in an equal and opposite way in another row, it can be invisible to the BIT_XOR. Not sure what the solution is because I don't know why CRC32 or BIT_XOR was chosen to begin with, but using a cryptographic hash function, rather than one which is just used for data integrity, seems like it would work.

 

 

Details

Assignee

Reporter

Priority

Affects versions

Smart Checklist

Created March 21, 2019 at 11:21 PM
Updated February 29, 2024 at 9:06 PM