Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4
General
Escalation
General
Escalation
Description
Description: When replicating a table with latin1 character set on the source server to a table with utf8mb4 character set and conversion ALL_NON_LOSSY it is possible to miss the letter "ü"
How to repeat: On the source:
CREATE TABLE `test1` (
`id` smallint NOT NULL,
`testtext` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
On the replica:
CREATE TABLE `test1` (
`id` smallint NOT NULL,
`testtext` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
On the source:
insert into test1 (id, testtext) values(1, "zü");
select testtext, hex(testtext) from test1 where id=1;
testtext hex(testtext)
zü 7AFC
On the replica:
select testtext, hex(testtext) from test1 where id=1;
testtext hex(testtext)
z 7A
Or run the attached test case for MTR.
Suggested fix: Do not loose the letter!
Environment
None
AFFECTED CS IDs
CS0015346
Attachments
3
04 Feb 2021, 05:37 PM
04 Feb 2021, 05:37 PM
04 Feb 2021, 05:37 PM
Smart Checklist
Activity
Show:
Sveta Smirnova February 7, 2021 at 11:35 AM
This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).
Description:
When replicating a table with latin1 character set on the source server to a table with utf8mb4 character set and conversion ALL_NON_LOSSY it is possible to miss the letter "ü"
How to repeat:
On the source:
CREATE TABLE `test1` ( `id` smallint NOT NULL, `testtext` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
On the replica:
CREATE TABLE `test1` ( `id` smallint NOT NULL, `testtext` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
On the source:
insert into test1 (id, testtext) values(1, "zü"); select testtext, hex(testtext) from test1 where id=1; testtext hex(testtext) zü 7AFC
On the replica:
select testtext, hex(testtext) from test1 where id=1; testtext hex(testtext) z 7A
Or run the attached test case for MTR.
Suggested fix:
Do not loose the letter!