PXC tarball install : Failed wsrep_sst_xtrabackup-v2 when adding new node

Description

PXC tarball install : Failed wsrep_sst_xtrabackup-v2 when adding new node

2023-06-02T17:05:51.978288+08:00 0 [ERROR] [MY-000000] [WSREP] Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.122.172' --datadir '/data/mysql/' --basedir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/' --plugindir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/lib/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2835' --mysqld-version '8.0.32-24.1'  --binlog 's172'

How to reproduce :

OS : CentOS Linux release 7.9.2009 (Core)

On all nodes :

yum -y install libaio socat firewall-cmd --permanent --add-port={4444,4567,4568,9250}/tcp firewall-cmd --reload cd /usr/local tar zxvf /nfs/dl/percona/pxc/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17.tar.gz ln -s Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17 mysql groupadd mysql useradd -r -g mysql -s /bin/false mysql mkdir -p /data/mysql /var/log/mysql chown mysql:mysql /data/mysql /var/log/mysql cat << EOF > /etc/my.cnf [mysqld] user                     = mysql datadir                  = /data/mysql server_id                = 171 log_bin                  = s171 report_host              = 192.168.122.171 log_timestamps           = SYSTEM enforce_gtid_consistency = ON gtid_mode                = ON binlog_format            = ROW default_storage_engine   = InnoDB innodb_autoinc_lock_mode = 2 log_error                = /var/log/mysql/mysqld.log pxc_encrypt_cluster_traffic = OFF pxc_strict_mode          = ENFORCING wsrep_sst_method         = xtrabackup-v2 wsrep_provider           = /usr/local/mysql/lib/libgalera_smm.so wsrep_node_address       = 192.168.122.171 wsrep_node_name          = pxc-node-171 wsrep_sst_donor          = pxc-node-172,pxc-node-173, wsrep_cluster_address    = gcomm://192.168.122.171,192.168.122.172,192.168.122.173 wsrep_cluster_name       = pxc-cluster wsrep_slave_threads      = 8 wsrep_log_conflicts EOF grep '^export PATH=' ~/.bash_profile || echo ' export PATH=/usr/local/mysql/bin:$PATH ' >> ~/.bash_profile exit

Node172 : Modify my.cnf to reflect proper IPs and node identifiers

sed -i 's/171$/172/' /etc/my.cnf

Node171 : Initialize first node.

[root@c7-171 ~]# echo $PATH /usr/local/mysql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/scripts:/root/bin [root@c7-171 ~]# ls -lh /usr/local/mysql lrwxrwxrwx. 1 root root 57 Jun  2 16:42 /usr/local/mysql -> Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17 [root@c7-171 ~]# mysqld --no-defaults --initialize-insecure --user=mysql --datadir=/data/mysql  --lc-messages-dir=/usr/local/mysql/share 2023-06-02T08:50:32.742019Z 0 [Warning] [MY-000000] [WSREP] Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks 2023-06-02T08:50:32.742654Z 0 [System] [MY-013169] [Server] /usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/bin/mysqld (mysqld 8.0.32-24.1) initializing of server in progress as process 1693 2023-06-02T08:50:32.745208Z 0 [Note] [MY-000000] [Galera] Loading provider none initial position: 00000000-0000-0000-0000-000000000000:-1 2023-06-02T08:50:32.745220Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library 'none' 2023-06-02T08:50:32.764469Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-06-02T08:50:34.213153Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2023-06-02T08:50:36.538782Z 5 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

Node171 : Bootstrap

[root@c7-171 ~]# mysqld --defaults-file=/etc/my.cnf --wsrep-new-cluster --daemonize mysqld will log errors to /var/log/mysql/mysqld.log mysqld is running as pid 1735 [root@c7-171 ~]# mysql -e"SHOW GLOBAL STATUS WHERE VARIABLE_NAME IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected','wsrep_ready')" +---------------------------+---------+ | Variable_name             | Value   | +---------------------------+---------+ | wsrep_cluster_size        | 1       | | wsrep_cluster_status      | Primary | | wsrep_connected           | ON      | | wsrep_local_state         | 4       | | wsrep_local_state_comment | Synced  | | wsrep_ready               | ON      | +---------------------------+---------+

Node172 : Try to join 2nd node then we got the error

[root@c7-172 ~]# echo $PATH /usr/local/mysql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/scripts:/root/bin [root@c7-172 ~]# ls -lh /usr/local/mysql lrwxrwxrwx. 1 root root 57 Jun  2 16:42 /usr/local/mysql -> Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17 [root@c7-172 ~]# mysqld --defaults-file=/etc/my.cnf --daemonize mysqld will log errors to /var/log/mysql/mysqld.log 2023-06-02T17:05:52.998202+08:00 0 [ERROR] [MY-011065] [Server] Unable to determine if daemon is running: No such file or directory (rc=0). 2023-06-02T17:05:52.998207+08:00 0 [ERROR] [MY-010946] [Server] Failed to start mysqld daemon. Check mysqld error log. [root@c7-172 ~]# less /var/log/mysql/mysqld.log [...] 2023-06-02T17:05:51.737194+08:00 2 [Note] [MY-000000] [WSREP] Server status change connected -> joiner 2023-06-02T17:05:51.737198+08:00 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2023-06-02T17:05:51.737260+08:00 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.122.172' --datadir '/data/mysql/' --basedir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/' --plugindir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/lib/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2835' --mysqld-version '8.0.32-24.1'  --binlog 's172' ) 2023-06-02T17:05:51.978288+08:00 0 [ERROR] [MY-000000] [WSREP] Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.122.172' --datadir '/data/mysql/' --basedir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/' --plugindir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/lib/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2835' --mysqld-version '8.0.32-24.1'  --binlog 's172'         Read: '(null)' 2023-06-02T17:05:51.978330+08:00 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.122.172' --datadir '/data/mysql/' --basedir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/' --plugindir '/usr/local/Percona-XtraDB-Cluster_8.0.32-24.1_Linux.x86_64.glibc2.17/lib/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2835' --mysqld-version '8.0.32-24.1'  --binlog 's172' : 1 (Operation not permitted) 2023-06-02T17:05:51.978364+08:00 2 [ERROR] [MY-000000] [WSREP] Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable. 2023-06-02T17:05:51.978443+08:00 2 [ERROR] [MY-000000] [Galera] SST request callback failed. This is unrecoverable, restart required. 2023-06-02T17:05:51.978450+08:00 2 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()

WORKAROUND

Go to function get_mysqld_path line 297 then disable exit-on-error(set +e) setting at start of line and enable it at end of line(set -e)

 

[root@c7-172 ~]# cp -nvp /usr/local/mysql/bin/wsrep_sst_common /usr/local/mysql/bin/wsrep_sst_common.orig ‘/usr/local/mysql/bin/wsrep_sst_common’ -> ‘/usr/local/mysql/bin/wsrep_sst_common.orig’ [root@c7-172 ~]# sed -i '297s|MYSQLD_PATH=.*|set +e; MYSQLD_PATH=$(readlink -f /proc/${WSREP_SST_OPT_PARENT}/exe); set -e|' /usr/local/mysql/bin/wsrep_sst_common [root@c7-172 ~]# sed -n '297p' /usr/local/mysql/bin/wsrep_sst_common.orig             MYSQLD_PATH=$(readlink -f /proc/${WSREP_SST_OPT_PARENT}/exe) [root@c7-172 ~]# sed -n '297p' /usr/local/mysql/bin/wsrep_sst_common             set +e; MYSQLD_PATH=$(readlink -f /proc/${WSREP_SST_OPT_PARENT}/exe); set -e

We should be able to add the new node now :

[root@c7-172 ~]# mysqld --defaults-file=/etc/my.cnf --daemonize mysqld will log errors to /var/log/mysql/mysqld.log mysqld is running as pid 3220 [root@c7-172 ~]# mysql -e"SHOW GLOBAL STATUS WHERE VARIABLE_NAME IN ('wsrep_local_state','wsrep_local_state_comment','wsrep_cluster_size','wsrep_cluster_status','wsrep_connected','wsrep_ready')" +---------------------------+---------+ | Variable_name             | Value   | +---------------------------+---------+ | wsrep_cluster_size        | 2       | | wsrep_cluster_status      | Primary | | wsrep_connected           | ON      | | wsrep_local_state         | 4       | | wsrep_local_state_comment | Synced  | | wsrep_ready               | ON      | +---------------------------+---------+

Suggested Fix :

Please see workaround.

Environment

None

AFFECTED CS IDs

CS0042045

Activity

Show:

aristotle.po December 12, 2023 at 6:55 AM

Hi ,

1. Yes, I think this should be solved as there are workarounds and I can see in https://jira.percona.com/browse/PXC-4317 that upcoming release   **  8.0.35-27 should solve the issue.

2. Correct it was an issue in my environment.

> If that is true, it means that the initial workaround still works, right?

Yes, initial workaround still work.

Kamil Holubicki December 11, 2023 at 9:15 AM

Hi ,

Is my understanding correct that

  1. this issue should be solved

  2. it is not an issue, because it was the problem with your local env

Please confirm.

If that is true, it means that the initial workaround still works, right?

aristotle.po December 6, 2023 at 10:01 AM

Hi Kamil,

You are correct, we have two issues.

1) Realize another workround to run it as mysql user and it went good on my test :

sudo -H -u mysql bash -c 'export PATH=$PATH:/usr/local/mysql/bin; mysqld_safe --defaults-file=/etc/my.cnf'

2) I missed to create /var/run/mysqld/ folder owned by mysql OS user. It's ok now.

Kamil Holubicki December 6, 2023 at 9:16 AM

It seems we've got two problem here:
1. The one with set -e/+e workaround, which is the same as https://jira.percona.com/browse/PXC-4317
2. The other one with Can't start server: can't check PID filepath: No such file or directory

aristotle.po December 6, 2023 at 8:52 AM

The bug is only on tarball installation. I did not observe the it in rpms install.

Done

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created June 2, 2023 at 10:24 AM
Updated March 8, 2024 at 1:46 PM
Resolved January 12, 2024 at 2:18 PM

Flag notifications