Issues

Select view

Select search mode

 

pt-online-schema-change can loose data if the charset do not match

Description

hi.

this is a bug report for pt-osc, see below for explanation of the issue.

suggested fixes are:

  • pt-osc check charset of table and columns and error immediately if config/arguments will use incompatible charset that will result in data loss.

  • pt-osc check count of rows in table and _table_new to ensure they match before swapping (ideally also compare checksum of columns in common, e.g. via pt-table-checksum method).

  • pt-osc error immediately if a nibble found no rows when the query to determine the upper boundary immediately prior ensures that the rows do exist.

 

PT_DEBUG=1 output: # /usr/bin/perl 5.022001  # Linux recovery1 4.4.0-1107-aws #118-Ubuntu SMP Sun May 3 23:28:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux  # Arguments: _[--config=/tmp/ptosc.cnf]_ _[--host=recovery1]_ _[--port=3306]_ _[--alter=ADD COLUMN collection_id bigint(20) unsigned NOT NULL DEFAULT 0, ADD COLUMN collection_item_id bigint(20) unsigned NOT NULL DEFAULT 0, ADD INDEX team_collection_id (team_id, collection_id)]_ _[--max-lag=1]_ _[--max-load=Threads_running=50]_ _[--critical-load=Threads_running=300]_ _[--chunk-time=1]_ _[--pause-file  =/tmp/pause.ptosc]_ _[--no-check-replication-filters]_ _[--execute]_ _[--nodrop-old-table]_ _[D=test_db,t=test_tbl]_ ... # OptionParser:254 82474 pt-online-schema-change 3.0.2 ... # OptionParser:367 82474 Parsing opt spec: group => default spec => charset|A=s attributes => HASH(0x1ee34f8) desc => Default character set  # OptionParser:406 82474 charset type: s ... # Connection info: recovery1 via TCP/IP Character set info: $VAR1 = [  # {  # Value => 'latin1',  # Variable_name => 'character_set_client'  # },  # {  # Value => 'latin1',  # Variable_name => 'character_set_connection'  # },  # {  # Value => 'utf8mb4',  # Variable_name => 'character_set_database'  # },  # {  # Value => 'binary',  # Variable_name => 'character_set_filesystem'  # },  # {  # Value => 'latin1',  # Variable_name => 'character_set_results'  # },  # {  # Value => 'utf8mb4',  # Variable_name => 'character_set_server'  # },  # {  # Value => 'utf8',  # Variable_name => 'character_set_system'  # },  # {  # Value => '/usr/share/mysql/charsets/',  # Variable_name => 'character_sets_dir'  # }  # ]; ... # TableParser:3208 82474 SHOW CREATE TABLE `test_tbl`.`test_db`  # TableParser:3218 82474 /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */  # TableParser:3402 82474 Storage engine: InnoDB  # TableParser:3250 82474 Table cols: `team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias`  # TableParser:3402 82474 Storage engine: InnoDB  # TableParser:3418 82474 Parsed key: PRIMARY KEY (`team_id`,`name`),  # TableParser:3438 82474 PRIMARY key cols: `team_id`, `name`  # TableParser:3460 82474 This key is the clustered key  # TableParser:3418 82474 Parsed key: KEY `team_id` (`team_id`,`date_create`),  # TableParser:3438 82474 team_id key cols: `team_id`, `date_create`  # TableParser:3418 82474 Parsed key: KEY `aliases` (`team_id`,`is_alias`)  # TableParser:3438 82474 aliases key cols: `team_id`, `is_alias`  # NibbleIterator:5885 82474 EXPLAIN SELECT * FROM `test_db`.`test_tbl` WHERE 1=1 >>>> NOTE THAT IT DOES NOT CHECK THE CHARSET OF THE TABLE! CREATE TABLE `test_tbl` (  `team_id` bigint(20) unsigned NOT NULL, -- PK first col  `date_create` int(10) unsigned NOT NULL,  `name` varchar(100) NOT NULL, -- PK 2nd col  `is_alias` tinyint(3) unsigned NOT NULL DEFAULT '0',  PRIMARY KEY (`team_id`,`name`),  KEY `team_id` (`team_id`,`date_create`),  KEY `aliases` (`team_id`,`is_alias`),  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; >>> SO WHEN IT ENCOUNTERS NON-LATIN1 CHARS, THEY BECOME '?', LITERALLY: # NibbleIterator:6006 82474 SELECT /*!40001 SQL_NO_CACHE */ `team_id`, `team_id`, `name` FROM `test_db`.`test_tbl` FORCE INDEX(`PRIMARY`) WHERE ((`team_id` > ?) OR (`team_id` = ? AND `name` >= ?)) ORDER BY `team_id`, `name` LIMIT ?, 2 /*next chunk boundary*/ params: 123456789, 123456789, smile 4611  ...  # NibbleIterator:6026 82474 Last upper boundary: $VAR1 = [  # '123456789',  # '123456789',  # '???'  # ];  ...  # Quoter:2819 82474 Serializing $VAR1 = [  # '123456789',  # '123456789',  # 'smile'  # ];  #  # Quoter:2835 82474 Serialized: < 595484798513,595484798513,smile >  # Quoter:2819 82474 Serializing $VAR1 = [  # '123456789',  # '123456789',  # '???'  # ];  #  # Quoter:2835 82474 Serialized: < 123456789,123456789,??? >  # Retry:3751 82474 Try 1 of 10  # pt_online_schema_change:10946 82474 INSERT LOW_PRIORITY IGNORE INTO `test_db`.`_test_tbl_new` (`team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias`) SELECT `team_id`, `user_id`, `date_create`, `name`, `data`, `is_alias`, `mimetype`, `alias` FROM `test_db`.`test_tbl` FORCE INDEX(`PRIMARY`) WHERE ((`team_id` > ?) OR (`team_id` = ? AND `name` >= ?)) AND ((`team_id` < ?) OR (`team_id` = ? AND `name` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 82474 copy nibble*/ lower boundary: 123456789 123456789 smile upper boundary: 123456789 123456789 ???  # pt_online_schema_change:10965 82474 SHOW WARNINGS  # Retry:3768 82474 Try code succeeded  # pt_online_schema_change:9453 82474 Nibble time: 0.459803104400635  # NibbleIterator:5615 82474 0 rows in nibble 31  # NibbleIterator:5627 82474 No rows in nibble or nibble skipped mysql> SET character_set_results = 'utf8mb4'; SELECT CAST(UNHEX('E3839E') AS CHAR);  Query OK, 0 rows affected (0.00 sec)  +-------------------------------+  | CAST(UNHEX('E3839E') AS CHAR) |  +-------------------------------+  | マ |  +-------------------------------+  1 row in set (0.00 sec) mysql> SET character_set_results = 'latin1'; SELECT CAST(UNHEX('E3839E') AS CHAR);  Query OK, 0 rows affected (0.00 sec)  +-------------------------------+  | CAST(UNHEX('E3839E') AS CHAR) |  +-------------------------------+  | ? |  +-------------------------------+  1 row in set (0.00 sec)

 

Environment

None

AFFECTED CS IDs

276052

Smart Checklist

Details

Assignee

Reporter

Priority

Affects versions

Smart Checklist

Created June 12, 2020 at 7:51 PM
Updated February 29, 2024 at 8:58 PM

Activity

Sveta SmirnovaJune 13, 2020 at 7:47 AM
Edited

Than you for the report.

Verified as described.

Indeed, pt-online-schema-change can corrupt data if it's default character set does not match character set for the key it uses to chunk rows in the table.

Workaround: use option --charset

Sveta SmirnovaJune 13, 2020 at 7:46 AM

How to repeat:

sveta@Delly-7390:~/build/ps-5.7/mysql-test$ ./mtr --start innodb & ... mysql> CREATE TABLE `test_tbl` ( `team_id` bigint(20) unsigned NOT NULL, `date_create` int(10) unsigned NOT NULL, `name` varchar(100) NOT NULL, `is_alias` tinyint(3) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`team_id`,`name`), KEY `team_id` (`team_id`,`date_create`), KEY `aliases` (`team_id`,`is_alias`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; Query OK, 0 rows affected (0.03 sec) mysql> \d | mysql> create procedure p1() begin set @i=0; repeat insert into test_tbl values(@i, @i, if(@i%2 = 0, "Света", "Тибор"), @i); set @i=@i+1; until @i > 255 end repeat; end | Query OK, 0 rows affected (0.00 sec) mysql> call p1() -> | Query OK, 0 rows affected (0.70 sec) mysql> select * from test_tbl| +---------+-------------+------------+----------+ | team_id | date_create | name | is_alias | +---------+-------------+------------+----------+ | 0 | 0 | Света | 0 | | 1 | 1 | Тибор | 1 | | 2 | 2 | Света | 2 | | 3 | 3 | Тибор | 3 | ... 256 rows in set (0.01 sec) sveta@Delly-7390:~/build/ps-5.7/mysql-test$ /home/sveta/src/percona-toolkit/bin/pt-online-schema-change --alter="ADD COLUMN collection_id bigint(20) unsigned NOT NULL DEFAULT 0" h=127.0.0.1,P=13001,u=root,D=test,t=test_tbl --chunk-size=1 --execute mysql> select * from test_tbl; Empty set (0.01 sec)