PXB stopping SQL thread while backing up replica

Description

I'm executing an xtrabackup and it is stopping the SQL thread on my replica right at the start of the backup. The SQL thread remains stopped until I restart it.

My guess is that it is the --safe-slave-backup option causing the issue, but I haven't tested to confirm.  

 

Running the following xtrabackup command: xtrabackup --defaults-extra-file=/data/etc/mysql/bkup.cnf --backup --slave-info --safe-slave-backup --stream=xbstream --parallel=4 --databases-exclude=_pending_drops | pigz --fast -p 1 -c > /data/backups/20230413122358/innobackup_vam-mysql-jaytest-002-02.iad3b.square-20230413122358.xbstream.gz
2023-04-13T12:24:00.358669-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized server arguments: --datadir=/data/mysql --tmpdir=/data/tmp --server-id=170033278 --log_bin=mysql-bin --innodb_buffer_pool_size=50G --innodb_flush_method=O_DIRECT
2023-04-13T12:24:00.359652-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --port=3306 --socket=/var/lib/mysql/mysql.sock --user=bkup --password=* --user=bkup --password=* --backup=1 --slave-info=1 --safe-slave-backup=1 --stream=xbstream --parallel=4 --databases-exclude=_pending_drops
xtrabackup version 8.0.31-24 based on MySQL server 8.0.31 Linux (x86_64) (revision id: f0754edb)
230413 12:24:00 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;port=3306;mysql_socket=/var/lib/mysql/mysql.sock' as 'bkup' (using password: YES).
230413 12:24:00 version_check Connected to MySQL server
230413 12:24:00 version_check Executing a version check against the server...
230413 12:24:00 version_check Done.
2023-04-13T12:24:00.494339-00:00 0 [Note] [MY-011825] [Xtrabackup] Connecting to MySQL server host: localhost, user: bkup, password: set, port: 3306, socket: /var/lib/mysql/mysql.sock
2023-04-13T12:24:00.520095-00:00 0 [Note] [MY-011825] [Xtrabackup] Using server version 8.0.31-23
2023-04-13T12:24:00.552525-00:00 0 [Note] [MY-011825] [Xtrabackup] Slave open temp tables: 0
2023-04-13T12:24:00.554678-00:00 0 [Note] [MY-011825] [Xtrabackup] Slave is safe to backup.
2023-04-13T12:24:00.554789-00:00 0 [Note] [MY-011825] [Xtrabackup] Executing LOCK TABLES FOR BACKUP ...
2023-04-13T12:24:00.564691-00:00 0 [Note] [MY-011825] [Xtrabackup] uses posix_fadvise().
2023-04-13T12:24:00.564749-00:00 0 [Note] [MY-011825] [Xtrabackup] cd to /data/mysql

Environment

xtrabackup version 8.0.31-24 based on MySQL server 8.0.31 Linux (x86_64) (revision id: f0754edb)
Server version: 8.0.31-23 Percona Server (GPL), Release 23.sq, Revision 771bf6f2ea43b51760313b09218ab30bb6b4156a

Activity

Show:

Aaditya Dubey January 17, 2024 at 3:17 PM

Hi

Please find the answers regarding why SQL thread is stopped during entire backup process:

The --safe-slave-backup option

In order to assure a consistent replication state, this option stops the replication SQL thread and waits to start backing up until Slave_open_temp_tables in SHOW STATUS is zero. If there are no open temporary tables, the backup will take place, otherwise the SQL thread will be started and stopped until there are no open temporary tables. The backup will fail if Slave_open_temp_tables does not become zero after --safe-slave-backup-timeout seconds (defaults to 300 seconds). The replication SQL thread will be restarted when the backup finishes.

Reference:

Bug is there indeed if XtraBackup is somehow failed due to any reason then “Replica_SQL_Running: No“ won't be started again automatically. Please note this is the case of safe backup option:

sending the concern to engineering for further review and updates.

Aaditya Dubey May 18, 2023 at 8:59 AM

Hi ,

Thank you for the updates, Will take further look and see if this can be repeated.

Jay Janssen May 17, 2023 at 2:34 PM

Hi, I believe the scenario requires xtrabackup to exit 1 somehow. If you look at my log above, xtrabackup got this error when I observed the behavior.  

Can you trigger an error exit by sending XB a signal perhaps?

Aaditya Dubey May 17, 2023 at 1:18 PM

Hi ,

Thank you for the report.
Unfortunately this issue is not repeating from my end, Please find the test case below:

Please let us know if full reproducible test case is available.

Jay Janssen April 17, 2023 at 2:44 PM

Got it, and that's a good point.

My main issue is more that the SQL thread is getting stopped for the entire backup now (though it not getting restarted after errors is an issue too). I presume this is because explicit temp tables are now Innodb-based.

The only way around replication being down this long I can see is to just not use `--safe-slave-backup` and trust that my users are not switching binlog formats in session as well as using explicit temp tables to load into permanent tables.

Details

Assignee

Reporter

Needs QA

Yes

Priority

Smart Checklist

Created April 13, 2023 at 12:39 PM
Updated July 22, 2024 at 1:18 PM