Details
Assignee
UnassignedUnassignedReporter
Przemyslaw MalkowskiPrzemyslaw MalkowskiPriority
MediumAffects versions
Needs QA
Yes
Details
Details
Assignee
Unassigned
UnassignedReporter
Przemyslaw Malkowski
Przemyslaw MalkowskiPriority
Affects versions
Needs QA
Yes
Smart Checklist
Smart Checklist
Smart Checklist
Created October 28, 2022 at 12:19 PM
Updated February 29, 2024 at 8:44 PM
The default behavior for resuming replication with pt-slave-restart is the tool tries to skip the culprit event. However, for error type 3002 this is an incorrect approach - not only won't it allow to resume replication but also may cause more harm.
For example:
Last_Errno: 3002 Last_Error: Query caused different errors on master and slave. Error on master: message (format)='Invalid error code' error code=126; Error on slave:actual message='no error', error code=0. Default database:''. Query:'DROP USER 'user1'@'%''
or:
Last_SQL_Errno: 3002 Last_SQL_Error: Query caused different errors on master and slave. Error on master: message (format)='Column count of %s.%s is wrong. Expected %d, found %d. The table is probably corrupted' error code=1805; Error on slave:actual message='no error', error code=0. Default database:'test'. Query:'drop user if exists 'test1'@'%''
mean the master has actually logged the query, despite the fact it resulted with an error. Then the replica applies the query but then realizes the master did not. Yet both are at the same position replication-wise, like the GTID sequence number in gtid_executed is the same. Therefore, pt-slave-restart tries to inject an empty trx using the next sequence number, which is basically wrong. And replication won't resume anyways, so the tool prints something like that:
$ pt-slave-restart -S /tmp/mysql_sandbox19732.sock --error-numbers=3002 -umsandbox -pmsandbox 2022-10-28T13:37:14 S=/tmp/mysql_sandbox19732.sock,p=...,u=msandbox mysql-relay.000016 756 3002 Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) ...
Related upstream bug: https://bugs.mysql.com/bug.php?id=85623