Return the behavior of safe-slave-backup before the lock-ddl implementation
Description
Environment
AFFECTED CS IDs
created
is caused by
Smart Checklist
Activity

Marcelo Altmann February 2, 2023 at 3:34 PM
In summary:
Server Behavior:
The current server behavior of allowing STOP SLAVE to be executed if it is issued on a different session than the session that executed LTFB is correct.
STOP SLAVE will hang if any of the slave worker threads is waiting for LTFB while executing a DDL.
As soon as the session that holds LTFB executes UNLOCK TABLES, slave worker thread will be able to proceed with its DDL and once it completes, STOP SLAVE will complete too.
Deadlock only occurs if STOP SLAVE is executed in the same session holding LTFB.
Percona Xtrabackup Behavior:
The current PXB behavior is correct. If we have to execute a STOP SLAVE it has to be done before LTFB/LIFB otherwise it will be blocked due to the global instance lock been acquired on the same session executing STOP SLAVE; Even if we try to open a new connection on PXB to execute the STOP SLAVE before copying non-innodb tables we can have a deadlock in case slave worker thread is waiting on LTFB to execute a DDL. STOP SLAVE will hang waiting for PXB to UNLOCK TABLES which only happens in the end of the backup, thus causing a deadlock.
We will remain executing STOP SLAVE before LTFB/LIFB

Yura Sorokin January 12, 2023 at 2:42 PM
In my opinion this is a bug.
Similarly to LIFB, "STOP SLAVE" under LTFB should not be allowed on any session.
This needs to be fixed in the PS code.

Marcelo Altmann January 9, 2023 at 3:13 PM
can you please validate/explain why we allow STOP SLAVE under LTFB on a different session. Most likely a bug that has to be fixed on server part.
FYI, this was introduced at . Upstream had the same issue with LIFB and they block all sessions - see https://github.com/mysql/mysql-server/commit/7e75ca24a273bd2e1aaa662e6712025c4dd8e0ca

Sveta Smirnova November 18, 2022 at 4:19 PM
Hi
If LOCK TABLES FOR BACKUP; and STOP SLAVE; run in one connection, they conflict. However, it is possible to run STOP SLAVE in separate connection without any issue.
In the another connection:
Check replica status in any of them:
LOCK INSTANCE FOR BACKUP blocks STOP REPLICA/SLAVE operation no matter from which connection it is called:
Another connection:

Marcelo Altmann November 15, 2022 at 4:59 PM
Hi
I implemented the work required on this ticket, however, it seems like test 3 is actually blocked too:
Can you please again?
Details
Details
Assignee

Reporter

Upstream Bug URL
Needs QA
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

The behavior of the stop slave changed on version 8.0.22 when the option "--lock-ddl" was added. After this version, the stop slave is executed before copying the InnoDB files.
Could the xtrabackup back to the old behavior? Doing the stop slave after copying the InnoDB Tables.