pt-slave-restart acts incorrectly when replication breaks with: Last_SQL_Errno: 3002

Description

The default behavior for resuming replication with pt-slave-restart is the tool tries to skip the culprit event. However, for error type 3002 this is an incorrect approach - not only won't it allow to resume replication but also may cause more harm.

For example:

 

Last_Errno: 3002 Last_Error: Query caused different errors on master and slave. Error on master: message (format)='Invalid error code' error code=126; Error on slave:actual message='no error', error code=0. Default database:''. Query:'DROP USER 'user1'@'%''

or: 

Last_SQL_Errno: 3002                Last_SQL_Error: Query caused different errors on master and slave. Error on master: message (format)='Column count of %s.%s is wrong. Expected %d, found %d. The table is probably corrupted' error code=1805; Error on slave:actual message='no error', error code=0. Default database:'test'. Query:'drop user if exists 'test1'@'%''

 

 

mean the master has actually logged the query, despite the fact it resulted with an error. Then the replica applies the query but then realizes the master did not. Yet both are at the same position replication-wise, like the GTID sequence number in gtid_executed is the same. Therefore, pt-slave-restart tries to inject an empty trx using the next sequence number, which is basically wrong.  And replication won't resume anyways, so the tool prints something like that:

$ pt-slave-restart -S /tmp/mysql_sandbox19732.sock --error-numbers=3002 -umsandbox -pmsandbox 2022-10-28T13:37:14 S=/tmp/mysql_sandbox19732.sock,p=...,u=msandbox mysql-relay.000016         756 3002  Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) Not checking slave because relay log file or position has not changed (file mysql-relay.000016 pos 756) ...

 

Related upstream bug: https://bugs.mysql.com/bug.php?id=85623

Environment

None

AFFECTED CS IDs

CS0031281

Activity

Details

Assignee

Reporter

Priority

Affects versions

Needs QA

Yes

Smart Checklist

Created October 28, 2022 at 12:19 PM
Updated February 29, 2024 at 8:44 PM

Flag notifications