pt-online-schema-change race overwrites new_table_name
General
Escalation
General
Escalation
Description
Environment
None
relates to
Activity
Show:
Sveta Smirnova March 13, 2025 at 4:50 PM
https://perconadev.atlassian.net/browse/PT-2355 is not the same. This one is about wrong table name update and that one is about error when resuming job with NULL boundaries.
Aaditya Dubey February 3, 2025 at 12:21 PM
Hi @Perry Harrington
Thank you for the report.
This issue is known to us and is being tracked here at https://perconadev.atlassian.net/browse/PT-2355 However, we are not marking this task duplicate because it contains a piece of code where the issue is happening. It should be fixed in the upcoming PT release.
Done
Details
Details
Assignee
Sveta Smirnova
Sveta SmirnovaReporter
Perry Harrington
Perry HarringtonPriority
Components
Affects versions
Fix versions
Needs QA
Yes
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created January 27, 2025 at 10:24 PM
Updated 2 days ago
Resolved 2 days ago
If you are using
--history
with pt-osc the query to update the history entry with thenew_table_name
does not constrain the UPDATE with a key, so it overwrites ALL entries in the history table. If you execute multiple migrations in parallel they end up stepping on eachother. If you then resume a migration you can end up with a scenario where only a portion of the key space is copied to the new table and then swapped.The problem is this stanza of code starting on line 9689 (v3.6.0):
if ( $o->get('history') ) { my $sth = $cxn->dbh()->prepare( "UPDATE ${hist_table} SET new_table_name = ?" ); $sth->execute($new_tbl->{tbl}); }
I have made and tested this change, which appears to do the correct thing now:
if ( $o->get('history') ) { my $sth = $cxn->dbh()->prepare( "UPDATE ${hist_table} SET new_table_name = ? WHERE job_id = ?" ); $sth->execute($new_tbl->{tbl}, $job_id); }
This behavior was observed in our testing of resume functionality, it manifested as the log output showing
___table_new
but the history table containing__table_new
which came from a different invocation of pt-osc. The end result was that approx 1.9mn keys were copied into the___table_new
working table, while 2.1mn keys were copied into a table called__table_new
that was missing the column that was being added by the migration. This resulted intable
containing 2.1mn keys and it was missing the column the migration added.