PXB stopping SQL thread while backing up replica
Description
Environment
xtrabackup version 8.0.31-24 based on MySQL server 8.0.31 Linux (x86_64) (revision id: f0754edb)
Server version: 8.0.31-23 Percona Server (GPL), Release 23.sq, Revision 771bf6f2ea43b51760313b09218ab30bb6b4156a
relates to
Activity
Hi @Jay Janssen
Please find the answers regarding why SQL thread is stopped during entire backup process:
The --safe-slave-backup
option¶
In order to assure a consistent replication state, this option stops the replication SQL thread and waits to start backing up until Slave_open_temp_tables
in SHOW STATUS
is zero. If there are no open temporary tables, the backup will take place, otherwise the SQL thread will be started and stopped until there are no open temporary tables. The backup will fail if Slave_open_temp_tables
does not become zero after --safe-slave-backup-timeout
seconds (defaults to 300 seconds). The replication SQL thread will be restarted when the backup finishes.
Reference: https://docs.percona.com/percona-xtrabackup/8.0/make-backup-in-replication-env.html
Bug is there indeed if XtraBackup is somehow failed due to any reason then “Replica_SQL_Running: No“ won't be started again automatically. Please note this is the case of safe backup option:
When using PS 8.0.31 & PXB 8.0.31
Sysbench started:
$ sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=1000 --tables=10 --mysql-db=test --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox22133.sock --threads=15 --time=0 --report-interval=1 --events=0 --db-driver=mysql run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 15
Report intermediate results every 1 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
...
BackUp Started :
$ ./xtrabackup --user=msandbox --password=msandbox --socket=/tmp/mysql_sandbox22133.sock --backup --slave-info --safe-slave-backup --stream=xbstream --parallel=4 --datadir=$HOME/sandboxes/rsandbox_ps8_0_31/node1/data/ --target-dir=$HOME/backup_full/node1/data/
...
Killed in-Between
slave1 [localhost:22133] {msandbox} ((none)) > show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 127.0.0.1
Source_User: rsandbox
Source_Port: 22132
Connect_Retry: 60
Source_Log_File: mysql-bin.000001
Read_Source_Log_Pos: 8330
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 8546
Relay_Source_Log_File: mysql-bin.000001
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 8330
Relay_Log_Space: 8752
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Source_Server_Id: 100
Source_UUID: 00022132-1111-1111-1111-111111111111
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State:
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 1
Network_Namespace:
1 row in set (0.00 sec)
It is stopped & not starting anymore! Need manual startUp
Same behaviour exists with PXB 8.0.35 as well
sending the concern to engineering for further review and updates.
Hi @Jay Janssen,
Thank you for the updates, Will take further look and see if this can be repeated.
Hi, I believe the scenario requires xtrabackup to exit 1 somehow. If you look at my log above, xtrabackup got this error when I observed the behavior.
2023-04-12T15:56:10.668127-00:00 1 [ERROR] [MY-012894] [InnoDB] Unable to open './#innodb_redo/#ib_redo8729' (error: 1504).
Can you trigger an error exit by sending XB a signal perhaps?
Hi @Jay Janssen,
Thank you for the report.
Unfortunately this issue is not repeating from my end, Please find the test case below:
$ xtrabackup --version
2023-05-17T13:13:09.229434-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized server arguments: --datadir=/var/lib/mysql
xtrabackup version 8.0.31-24 based on MySQL server 8.0.31 Linux (x86_64) (revision id: f0754edb)
slave1 [localhost:22133] {msandbox} ((none)) > select version();
+-----------+
| version() |
+-----------+
| 8.0.31-23 |
+-----------+
1 row in set (0.00 sec)
Executed R/W load on master server using below sysbench command:
sysbench /usr/share/sysbench/oltp_read_write.lua --mysql_storage_engine=innodb --table-size=1000 --tables=10 --mysql-db=test --mysql-user=xxxx --mysql-password=xxxx --mysql-socket=/tmp/mysql_sandbox22132.sock --threads=15 --time=0 --report-interval=1 --events=0 --db-driver=mysql run
Backup Command:
$ xtrabackup --user=xxxx --password=xxxx --socket=/tmp/mysql_sandbox22133.sock --backup --slave-info --safe-slave-backup --stream=xbstream --parallel=4 --datadir=$HOME/sandboxes/rsandbox_ps8_0_31/node1/data/ --target-dir=$HOME/backup_full/node1/data/
...
XBSTCK01Extrabackup_infoXBSTCK01Pxtrabackup_tablespaces''?4{"version":3,"external_tablespaces":[]}XBSTCK01Extrabackup_tablespaces2023-05-17T13:07:14.414684-00:00 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (30844188) to (46045387) was copied.
2023-05-17T13:07:14.704853-00:00 0 [Note] [MY-011825] [Xtrabackup] completed OK!
Replica_SQL_Running: No -> Seen during backup but restarted after backup is completed.
slave1 [localhost:22133] {msandbox} ((none)) > show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 127.0.0.1
Source_User: rsandbox
Source_Port: 22132
Connect_Retry: 60
Source_Log_File: mysql-bin.000001
Read_Source_Log_Pos: 48655530
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 23568323
Relay_Source_Log_File: mysql-bin.000001
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 23568107
Relay_Log_Space: 48655952
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Source_Server_Id: 100
Source_UUID: 00022132-1111-1111-1111-111111111111
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State:
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 1
Network_Namespace:
1 row in set (0.00 sec)
Please let us know if full reproducible test case is available.
Got it, and that's a good point.
My main issue is more that the SQL thread is getting stopped for the entire backup now (though it not getting restarted after errors is an issue too). I presume this is because explicit temp tables are now Innodb-based.
The only way around replication being down this long I can see is to just not use `--safe-slave-backup` and trust that my users are not switching binlog formats in session as well as using explicit temp tables to load into permanent tables.
I'm executing an xtrabackup and it is stopping the SQL thread on my replica right at the start of the backup. The SQL thread remains stopped until I restart it.
My guess is that it is the --safe-slave-backup option causing the issue, but I haven't tested to confirm.
Running the following xtrabackup command: xtrabackup --defaults-extra-file=/data/etc/mysql/bkup.cnf --backup --slave-info --safe-slave-backup --stream=xbstream --parallel=4 --databases-exclude=_pending_drops | pigz --fast -p 1 -c > /data/backups/20230413122358/innobackup_vam-mysql-jaytest-002-02.iad3b.square-20230413122358.xbstream.gz
2023-04-13T12:24:00.358669-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized server arguments: --datadir=/data/mysql --tmpdir=/data/tmp --server-id=170033278 --log_bin=mysql-bin --innodb_buffer_pool_size=50G --innodb_flush_method=O_DIRECT
2023-04-13T12:24:00.359652-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --port=3306 --socket=/var/lib/mysql/mysql.sock --user=bkup --password=* --user=bkup --password=* --backup=1 --slave-info=1 --safe-slave-backup=1 --stream=xbstream --parallel=4 --databases-exclude=_pending_drops
xtrabackup version 8.0.31-24 based on MySQL server 8.0.31 Linux (x86_64) (revision id: f0754edb)
230413 12:24:00 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;port=3306;mysql_socket=/var/lib/mysql/mysql.sock' as 'bkup' (using password: YES).
230413 12:24:00 version_check Connected to MySQL server
230413 12:24:00 version_check Executing a version check against the server...
230413 12:24:00 version_check Done.
2023-04-13T12:24:00.494339-00:00 0 [Note] [MY-011825] [Xtrabackup] Connecting to MySQL server host: localhost, user: bkup, password: set, port: 3306, socket: /var/lib/mysql/mysql.sock
2023-04-13T12:24:00.520095-00:00 0 [Note] [MY-011825] [Xtrabackup] Using server version 8.0.31-23
2023-04-13T12:24:00.552525-00:00 0 [Note] [MY-011825] [Xtrabackup] Slave open temp tables: 0
2023-04-13T12:24:00.554678-00:00 0 [Note] [MY-011825] [Xtrabackup] Slave is safe to backup.
2023-04-13T12:24:00.554789-00:00 0 [Note] [MY-011825] [Xtrabackup] Executing LOCK TABLES FOR BACKUP ...
2023-04-13T12:24:00.564691-00:00 0 [Note] [MY-011825] [Xtrabackup] uses posix_fadvise().
2023-04-13T12:24:00.564749-00:00 0 [Note] [MY-011825] [Xtrabackup] cd to /data/mysql