Pt-archiver --charset option is not working for MySQL8.0

Description

Source and destination MySQL:

 

pt-archiver command

--charset=latin1 is not working because pt-archiver force to SET NAMES utf8mb4 (related code release I found).May I know why pt-archiver is doing this and is there any workaround for tables using latin1 in MySQL version 8 (is it safe to comment this part of code?)

 

Apart from the above question, I also found when ANSI sql_mode is enabled, --check-charset failed because of `"` is recognised as an identifier. Is it safe to replace the double quote with single quote? and is there any potential issue when using pt-archiver with ANSI enabled?

As attched Screenshot 2022-07-25 at 10.59.54 AM.png

Thanks for your help.

Environment

None

AFFECTED CS IDs

CS0036760

Attachments

3

Smart Checklist

Activity

Show:

Sveta Smirnova December 7, 2023 at 4:50 PM

Case with ANSI_QUOTES should be fixed by https://jira.percona.com/browse/PT-2207. Please check and open separate bug if the issue still exists for you. You need to use version 3.5.6 (not released) or download tool from GitHub to test.

Jaime Sicam May 31, 2023 at 4:07 AM

This is the code that forces utf8mb4 on MySQL 8.0 even if you change the charset parameter:

If the commented #if statement is  used then the chosen charset set for MySQL 8.0 could be used.

Aaditya Dubey August 12, 2022 at 12:47 PM

Hi  ,

Thank you for the report.
I have tried debug the issue and looks like it is not taking charset even it is specified, please find my test case below:

 

Character set mismatch: --source DSN uses utf8mb4, table uses latin1.  You can disable this check by specifying --no-check-charset. 
--charset=latin1 is not working because pt-archiver force to SET NAMES utf8mb4 (related code release I found).May I know why pt-archiver is doing this and is there any workaround for tables using latin1 in MySQL version 8 (is it safe to comment this part of code?)

--[no]check-charset

default: yes

Ensure connection and table character sets are the same. Disabling this check may cause text to be erroneously converted from one character set to another (usually from utf8 to latin1) which may cause data loss or mojibake. Disabling this check may be useful or necessary when character set conversions are intended.

 

Apart from the above question, I also found when ANSI sql_mode is enabled, --check-charset failed because of `"` is recognised as an identifier. Is it safe to replace the double quote with single quote? and is there any potential issue when using pt-archiver with ANSI enabled?

ANSI

Equivalent to REAL_AS_FLOAT, PIPES_AS_CONCAT, ANSI_QUOTES, IGNORE_SPACE, and ONLY_FULL_GROUP_BY.

ANSI_QUOTES

Treat " as an identifier quote character (like the ` quote character) and not as a string quote character. You can still use ` to quote identifiers with this mode enabled. With ANSI_QUOTES enabled, you cannot use double quotation marks to quote literal strings because they are interpreted as identifiers.

 

Sending the concern to engineering for further review and updates.

Yijian Zhang July 25, 2022 at 3:48 AM

And we found that pt-archiver will force to use utf8mb4, even if you already specified --charset=latin1. May I know why remove the original logical  && !$o -> get('charset') ?

Done

Details

Assignee

Reporter

Priority

Affects versions

Fix versions

Smart Checklist

Created July 25, 2022 at 3:03 AM
Updated March 8, 2024 at 12:33 PM
Resolved December 7, 2023 at 6:42 PM