Pt-archiver --charset option is not working for MySQL8.0
Description
Environment
AFFECTED CS IDs
Attachments
Smart Checklist
Activity

Sveta Smirnova December 7, 2023 at 4:50 PM
Case with ANSI_QUOTES should be fixed by https://jira.percona.com/browse/PT-2207. Please check and open separate bug if the issue still exists for you. You need to use version 3.5.6 (not released) or download tool from GitHub to test.

Jaime Sicam May 31, 2023 at 4:07 AM
This is the code that forces utf8mb4 on MySQL 8.0 even if you change the charset parameter:
If the commented #if statement is used then the chosen charset set for MySQL 8.0 could be used.

Aaditya Dubey August 12, 2022 at 12:47 PM
Hi ,
Thank you for the report.
I have tried debug the issue and looks like it is not taking charset even it is specified, please find my test case below:
Character set mismatch: --source DSN uses utf8mb4, table uses latin1. You can disable this check by specifying --no-check-charset.
--charset=latin1 is not working because pt-archiver force to SET NAMES utf8mb4 (related code release I found).May I know why pt-archiver is doing this and is there any workaround for tables using latin1 in MySQL version 8 (is it safe to comment this part of code?)
--[no]check-charset
default: yes
Ensure connection and table character sets are the same. Disabling this check may cause text to be erroneously converted from one character set to another (usually from utf8 to latin1) which may cause data loss or mojibake. Disabling this check may be useful or necessary when character set conversions are intended.
Apart from the above question, I also found when ANSI sql_mode is enabled, --check-charset failed because of `"` is recognised as an identifier. Is it safe to replace the double quote with single quote? and is there any potential issue when using pt-archiver with ANSI enabled?
ANSI
Equivalent to REAL_AS_FLOAT, PIPES_AS_CONCAT, ANSI_QUOTES, IGNORE_SPACE, and ONLY_FULL_GROUP_BY.
ANSI_QUOTES
Treat " as an identifier quote character (like the ` quote character) and not as a string quote character. You can still use ` to quote identifiers with this mode enabled. With ANSI_QUOTES enabled, you cannot use double quotation marks to quote literal strings because they are interpreted as identifiers.
Sending the concern to engineering for further review and updates.

Yijian Zhang July 25, 2022 at 3:48 AM
And we found that pt-archiver will force to use utf8mb4, even if you already specified --charset=latin1. May I know why remove the original logical && !$o -> get('charset') ?
Source and destination MySQL:
pt-archiver command
--charset=latin1 is not working because pt-archiver force to SET NAMES utf8mb4 (related code release I found).May I know why pt-archiver is doing this and is there any workaround for tables using latin1 in MySQL version 8 (is it safe to comment this part of code?)
Apart from the above question, I also found when ANSI sql_mode is enabled, --check-charset failed because of `"` is recognised as an identifier. Is it safe to replace the double quote with single quote? and is there any potential issue when using pt-archiver with ANSI enabled?
As attched Screenshot 2022-07-25 at 10.59.54 AM.png
Thanks for your help.