[pr-archiver] charset mistmatch and invalid cyrillic chars
General
Escalation
General
Escalation
Description
Hello. I try to dump TSV-file with pt-archiver.
pt-archiver --no-delete --source h=10.2.0.103,D=stat,t=CallStatV4 --limit=1000 --where "id > 1350363818" --file load_to_clickhouse.tsv Character set mismatch: --source DSN uses latin1, table uses utf8. You can disable this check by specifying --no-check-charset.
I get an error:
Character set mismatch: --source DSN uses latin1, table uses utf8. You can disable this check by specifying --no-check-charset.
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "stat"; ----------------------------
default_character_set_name
----------------------------
utf8
----------------------------
mysql> SELECT CCSA.character_set_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "stat" AND T.table_name = "CallStatV4"; --------------------
character_set_name
--------------------
utf8
--------------------
If I give option --charset=utf8 to pt-archiver, there is no error, but cyrillic chars in file are invalid like this: "name": "Ðнжелика"
Environment
None
Smart Checklist
Activity
Show:
Jira Bot March 4, 2021 at 3:56 PM
Hello , It's been 52 days since this issue went into Incomplete and we haven't heard from you on this.
At this point, our policy is to Close this issue, to keep things from getting too cluttered. If you have more information about this issue and wish to reopen it, please reply with a comment containing "jira-bot=reopen".
Jira Bot February 24, 2021 at 3:56 PM
Hello , It's jira-bot again. Your bug report is important to us, but we haven't heard from you since the previous notification. If we don't hear from you on this in 7 days, the issue will be automatically closed.
Jira Bot February 9, 2021 at 2:56 PM
Hello , I'm jira-bot, Percona's automated helper script. Your bug report is important to us but we've been unable to reproduce it, and asked you for more information. If we haven't heard from you on this in 3 more weeks, the issue will be automatically closed.
Lalit Choudhary January 11, 2021 at 1:58 PM
Edited
Hi
Thank you for the report.
I can't reproduce the described case with PT 3.2.1 version.
Hello. I try to dump TSV-file with pt-archiver.
pt-archiver --no-delete --source h=10.2.0.103,D=stat,t=CallStatV4 --limit=1000 --where "id > 1350363818" --file load_to_clickhouse.tsv
Character set mismatch: --source DSN uses latin1, table uses utf8. You can disable this check by specifying --no-check-charset.
I get an error:
Character set mismatch: --source DSN uses latin1, table uses utf8. You can disable this check by specifying --no-check-charset.
But there is utf8 everywhere in DSN:
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "stat";
----------------------------
default_character_set_name
----------------------------
utf8
----------------------------
mysql> SELECT CCSA.character_set_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "stat" AND T.table_name = "CallStatV4";
--------------------
character_set_name
--------------------
utf8
--------------------
If I give option --charset=utf8 to pt-archiver, there is no error, but cyrillic chars in file are invalid like this: "name": "Ðнжелика"