Issues
- pt-online-schema-change can loose data if the charset do not matchPT-1861
- Add option to pt-upgrade to display all timing informationPT-176
- pt-table-checksum shows diffs when table has columns with different collation/charsetPT-136Resolved issue: PT-136Carlos Salguero
- pt-table-sync: Cannot nibble tablePT-134
- Sandbox won't start correctly if autocommit=0 in my.cnfPT-133Resolved issue: PT-133Carlos Salguero
- ps output of pt-stalk should include memory usagePT-128Resolved issue: PT-128Agustin Gallego
- pt-online-schema-change wrongly thinks columns are trying to be renamedPT-126Resolved issue: PT-126Carlos Salguero
- pt-mysql-summary does not work with MariaDB 10.2.5PT-120
- pt-online-schema change eats data on adding a unique indexPT-116Resolved issue: PT-116Carlos Salguero
- Make OptionParser to accept repeatable DSNsPT-115Resolved issue: PT-115Carlos Salguero
- pt-online-schema change eats data on adding a unique indexPT-110Resolved issue: PT-110Carlos Salguero
- Make pt-osc compatible with AFTER triggersPT-91Resolved issue: PT-91Carlos Salguero
- Collect information about prepared statements if P_S is enabledPT-90Resolved issue: PT-90Agustin Gallego
13 of 13
pt-online-schema-change can loose data if the charset do not match
General
Escalation
General
Escalation
Description
Environment
None
AFFECTED CS IDs
276052
Smart Checklist
Details
Assignee
UnassignedUnassignedReporter
Tibor Korocz (Percona)Tibor Korocz (Percona)Priority
MediumAffects versions
Details
Details
Assignee
Unassigned
UnassignedReporter
Tibor Korocz (Percona)
Tibor Korocz (Percona)Priority
Affects versions
Smart Checklist
Smart Checklist
Smart Checklist
Created June 12, 2020 at 7:51 PM
Updated February 29, 2024 at 8:58 PM
Activity
Sveta SmirnovaJune 13, 2020 at 7:47 AMEdited
Than you for the report.
Verified as described.
Indeed, pt-online-schema-change can corrupt data if it's default character set does not match character set for the key it uses to chunk rows in the table.
Workaround: use option --charset
Sveta SmirnovaJune 13, 2020 at 7:46 AM
How to repeat:
sveta@Delly-7390:~/build/ps-5.7/mysql-test$ ./mtr --start innodb &
...
mysql> CREATE TABLE `test_tbl` ( `team_id` bigint(20) unsigned NOT NULL, `date_create` int(10) unsigned NOT NULL, `name` varchar(100) NOT NULL, `is_alias` tinyint(3) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`team_id`,`name`), KEY `team_id` (`team_id`,`date_create`), KEY `aliases` (`team_id`,`is_alias`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.03 sec)
mysql> \d |
mysql> create procedure p1() begin set @i=0; repeat insert into test_tbl values(@i, @i, if(@i%2 = 0, "Света", "Тибор"), @i); set @i=@i+1; until @i > 255 end repeat; end |
Query OK, 0 rows affected (0.00 sec)
mysql> call p1()
-> |
Query OK, 0 rows affected (0.70 sec)
mysql> select * from test_tbl|
+---------+-------------+------------+----------+
| team_id | date_create | name | is_alias |
+---------+-------------+------------+----------+
| 0 | 0 | Света | 0 |
| 1 | 1 | Тибор | 1 |
| 2 | 2 | Света | 2 |
| 3 | 3 | Тибор | 3 |
...
256 rows in set (0.01 sec)
sveta@Delly-7390:~/build/ps-5.7/mysql-test$ /home/sveta/src/percona-toolkit/bin/pt-online-schema-change --alter="ADD COLUMN collection_id bigint(20) unsigned NOT NULL DEFAULT 0" h=127.0.0.1,P=13001,u=root,D=test,t=test_tbl --chunk-size=1 --execute
mysql> select * from test_tbl;
Empty set (0.01 sec)
hi.
this is a bug report for pt-osc, see below for explanation of the issue.
suggested fixes are:
pt-osc check charset of table and columns and error immediately if config/arguments will use incompatible charset that will result in data loss.
pt-osc check count of rows in table and _table_new to ensure they match before swapping (ideally also compare checksum of columns in common, e.g. via pt-table-checksum method).
pt-osc error immediately if a nibble found no rows when the query to determine the upper boundary immediately prior ensures that the rows do exist.
PT_DEBUG=1 output: # /usr/bin/perl 5.022001 # Linux recovery1 4.4.0-1107-aws #118-Ubuntu SMP Sun May 3 23:28:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # Arguments: _[--config=/tmp/ptosc.cnf]_ _[--host=recovery1]_ _[--port=3306]_ _[--alter=ADD COLUMN collection_id bigint(20) unsigned NOT NULL DEFAULT 0, ADD COLUMN collection_item_id bigint(20) unsigned NOT NULL DEFAULT 0, ADD INDEX team_collection_id (team_id, collection_id)]_ _[--max-lag=1]_ _[--max-load=Threads_running=50]_ _[--critical-load=Threads_running=300]_ _[--chunk-time=1]_ _[--pause-file =/tmp/pause.ptosc]_ _[--no-check-replication-filters]_ _[--execute]_ _[--nodrop-old-table]_ _[D=test_db,t=test_tbl]_ ... # OptionParser:254 82474 pt-online-schema-change 3.0.2 ... # OptionParser:367 82474 Parsing opt spec: group => default spec => charset|A=s attributes => HASH(0x1ee34f8) desc => Default character set # OptionParser:406 82474 charset type: s ... # Connection info: recovery1 via TCP/IP Character set info: $VAR1 = [ # { # Value => 'latin1', # Variable_name => 'character_set_client' # }, # { # Value => 'latin1', # Variable_name => 'character_set_connection' # }, # { # Value => 'utf8mb4', # Variable_name => 'character_set_database' # }, # { # Value => 'binary', # Variable_name => 'character_set_filesystem' # }, # { # Value => 'latin1', # Variable_name => 'character_set_results' # }, # { # Value => 'utf8mb4', # Variable_name => 'character_set_server' # }, # { # Value => 'utf8', # Variable_name => 'character_set_system' # }, # { # Value => '/usr/share/mysql/charsets/', # Variable_name => 'character_sets_dir' # } # ]; ... # TableParser:3208 82474 SHOW CREATE TABLE `test_tbl`.`test_db` # TableParser:3218 82474 /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ # TableParser:3402 82474 Storage engine: InnoDB # TableParser:3250 82474 Table cols: `team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias` # TableParser:3402 82474 Storage engine: InnoDB # TableParser:3418 82474 Parsed key: PRIMARY KEY (`team_id`,`name`), # TableParser:3438 82474 PRIMARY key cols: `team_id`, `name` # TableParser:3460 82474 This key is the clustered key # TableParser:3418 82474 Parsed key: KEY `team_id` (`team_id`,`date_create`), # TableParser:3438 82474 team_id key cols: `team_id`, `date_create` # TableParser:3418 82474 Parsed key: KEY `aliases` (`team_id`,`is_alias`) # TableParser:3438 82474 aliases key cols: `team_id`, `is_alias` # NibbleIterator:5885 82474 EXPLAIN SELECT * FROM `test_db`.`test_tbl` WHERE 1=1 >>>> NOTE THAT IT DOES NOT CHECK THE CHARSET OF THE TABLE! CREATE TABLE `test_tbl` ( `team_id` bigint(20) unsigned NOT NULL, -- PK first col `date_create` int(10) unsigned NOT NULL, `name` varchar(100) NOT NULL, -- PK 2nd col `is_alias` tinyint(3) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`team_id`,`name`), KEY `team_id` (`team_id`,`date_create`), KEY `aliases` (`team_id`,`is_alias`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; >>> SO WHEN IT ENCOUNTERS NON-LATIN1 CHARS, THEY BECOME '?', LITERALLY: # NibbleIterator:6006 82474 SELECT /*!40001 SQL_NO_CACHE */ `team_id`, `team_id`, `name` FROM `test_db`.`test_tbl` FORCE INDEX(`PRIMARY`) WHERE ((`team_id` > ?) OR (`team_id` = ? AND `name` >= ?)) ORDER BY `team_id`, `name` LIMIT ?, 2 /*next chunk boundary*/ params: 123456789, 123456789, smile 4611 ... # NibbleIterator:6026 82474 Last upper boundary: $VAR1 = [ # '123456789', # '123456789', # '???' # ]; ... # Quoter:2819 82474 Serializing $VAR1 = [ # '123456789', # '123456789', # 'smile' # ]; # # Quoter:2835 82474 Serialized: < 595484798513,595484798513,smile > # Quoter:2819 82474 Serializing $VAR1 = [ # '123456789', # '123456789', # '???' # ]; # # Quoter:2835 82474 Serialized: < 123456789,123456789,??? > # Retry:3751 82474 Try 1 of 10 # pt_online_schema_change:10946 82474 INSERT LOW_PRIORITY IGNORE INTO `test_db`.`_test_tbl_new` (`team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias`) SELECT `team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias` FROM `test_db`.`test_tbl` FORCE INDEX(`PRIMARY`) WHERE ((`team_id` > ?) OR (`team_id` = ? AND `name` >= ?)) AND ((`team_id` < ?) OR (`team_id` = ? AND `name` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 82474 copy nibble*/ lower boundary: 123456789 123456789 smile upper boundary: 123456789 123456789 ??? # pt_online_schema_change:10965 82474 SHOW WARNINGS # Retry:3768 82474 Try code succeeded # pt_online_schema_change:9453 82474 Nibble time: 0.459803104400635 # NibbleIterator:5615 82474 0 rows in nibble 31 # NibbleIterator:5627 82474 No rows in nibble or nibble skipped mysql> SET character_set_results = 'utf8mb4'; SELECT CAST(UNHEX('E3839E') AS CHAR); Query OK, 0 rows affected (0.00 sec) +-------------------------------+ | CAST(UNHEX('E3839E') AS CHAR) | +-------------------------------+ | マ | +-------------------------------+ 1 row in set (0.00 sec) mysql> SET character_set_results = 'latin1'; SELECT CAST(UNHEX('E3839E') AS CHAR); Query OK, 0 rows affected (0.00 sec) +-------------------------------+ | CAST(UNHEX('E3839E') AS CHAR) | +-------------------------------+ | ? | +-------------------------------+ 1 row in set (0.00 sec)