Details
Assignee
UnassignedUnassignedReporter
Michaël de GrootMichaël de GrootPriority
MediumComponents
Affects versions
Needs QA
Yes
Details
Details
Assignee
Unassigned
UnassignedReporter
Michaël de Groot
Michaël de GrootPriority
Components
Affects versions
Needs QA
Yes
Smart Checklist
Smart Checklist
Smart Checklist
Created last week
Updated last week
Hello,
I created a script to make pt-table-checksum results monitorable and automatically retry if tables or chunks were skipped. The script can be found here: https://gitlab.com/de-groot-consultancy-ansible-roles/dba-toolkit/-/blob/checksums_missing_tables/templates/checksum.sh.j2?ref_type=heads - Or, after I merge it it, it will be here: (same file on beta branch)
I noticed that pt-table-checksum some times stops working, produces a lot of errors with “MySQL has gone away”. This means a lot of tables are skipped, so the tool correctly gives exit code 64.
Perhaps it also gives exit code 32 (chunk skipped). However, given the error message, it cannot store this in the database by inserting a record that indicates a chunk is skipped.
It also exited with exit code 1 (any non-fatal error). So I have decided that if I receive exit code 1 (and 4 and 8), I will delete the checksum results of the previous table that was checksummed, as it is not guaranteed that the checksum process is complete for that table.
If the checksum did finish to checksum an entire table this deletes the checksum result of that table, which is a pity. If the error that causes the process to not finish successfully persists it will delete the last 10 (default value for amount of retries) tables that were checksummed.
Therefore I would like to request a new feature: Can you add a bit into the exit code bitmask that indicates if the checksum process is half way a table, and that therefore the last table’s checksum results should be deleted?
Thank you!