Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4

Description

Description:
When replicating a table with latin1 character set on the source server to a table with utf8mb4 character set and conversion ALL_NON_LOSSY it is possible to miss the letter "ü"

How to repeat:
On the source:

CREATE TABLE `test1` ( `id` smallint NOT NULL, `testtext` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1

On the replica:

CREATE TABLE `test1` ( `id` smallint NOT NULL, `testtext` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

On the source:

insert into test1 (id, testtext) values(1, "zü"); select testtext, hex(testtext) from test1 where id=1; testtext hex(testtext) zü 7AFC

On the replica:

select testtext, hex(testtext) from test1 where id=1; testtext hex(testtext) z 7A

Or run the attached test case for MTR.

Suggested fix:
Do not loose the letter!

Environment

None

AFFECTED CS IDs

CS0015346

Attachments

3
  • 04 Feb 2021, 05:37 PM
  • 04 Feb 2021, 05:37 PM
  • 04 Feb 2021, 05:37 PM

Smart Checklist

Activity

Show:

Sveta Smirnova February 7, 2021 at 11:35 AM

This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).

Details

Assignee

Reporter

Labels

Affects versions

Priority

Smart Checklist

Created February 4, 2021 at 5:27 PM
Updated March 6, 2024 at 10:43 AM