Restore of the cluster results in replica stuck in CrashLoopBackOff

Description

From following https://docs.percona.com/percona-operator-for-mysql/ps/backups.html :

Log shows:

 

$ kubectl logs cluster1-mysql-1 (...) 2022-11-07T17:22:09.453062Z 5 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error connecting to master 'replication@cluster1-mysql-2.cluster1-mysql.default:3306' - retry-time: 60 retries: 1 message: Unknown MySQL server host 'cluster1-mysql-2.cluster1-mysql.default' (-2), Error_code: MY-002005

where cluster1-mysql-2 was the old primary ...

Environment

None

Attachments

1
  • 07 Nov 2022, 05:34 PM

Activity

Show:

Slava Sarzhan December 7, 2022 at 9:33 AM

thanks for the report. The issue was fixed and will be available in the next release.

Tomislav Plavcic November 8, 2022 at 9:47 AM

More info from slack:

Clearly, the culprit there is the Orchestrator: it is not being able to assign a new primary, at least not when the primary at the moment the backup was taken was cluster1-mysql-2 (the 3rd instance. During the restore process, cluster1-mysql-0 would come up first but not as a primary; it would still have replication pointed to cluster1-mysql-2 and read_only enabled, but at least it would come up. The second mysql instance, cluster1-mysql-1, would always get stuck with the error described in the bug above; and since replication would fail to start, mysql would be killed (by Orchestrator, I suppose) and then the pod would enter in CrashLoopBackOff and remain there, with cluster1-mysql-2 not having a chance to start. This should be very easy to reproduce with a simple restore process where the primary at the time of the backup was cluster1-mysql-2.

I could reproduce the issue as well:

$ kubectl get pods NAME READY STATUS RESTARTS AGE cluster1-haproxy-0 2/2 Running 0 8m3s cluster1-haproxy-1 2/2 Running 0 7m58s cluster1-haproxy-2 2/2 Running 0 7m55s cluster1-mysql-0 3/3 Running 0 9m28s cluster1-mysql-1 2/3 CrashLoopBackOff 12 (21s ago) 8m48s cluster1-orc-0 2/2 Running 0 9m28s cluster1-orc-1 2/2 Running 0 8m52s cluster1-orc-2 2/2 Running 0 8m17s percona-server-mysql-operator-57cccddf49-x5dj6 1/1 Running 0 38m xb-backup1-s3-us-west-kkl4c 0/1 Completed 0 17m xb-restore-restore1-9fgpw 0/1 Completed 0 10m $ kubectl get ps-restore NAME STATE AGE restore1 Succeeded 12m + exec mysqld 2022-11-08T09:38:36.057427Z 0 [Warning] [MY-011068] [Server] The syntax '--skip-host-cache' is deprecated and will be removed in a future release. Please use SET GLOBAL host_cache_size=0 instead. 2022-11-08T09:38:36.060443Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path. 2022-11-08T09:38:36.060580Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30-22) starting as process 1 2022-11-08T09:38:36.070024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2022-11-08T09:38:36.512324Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2022-11-08T09:38:36.678311Z 0 [Warning] [MY-010918] [Repl] 'rpl_semi_sync_slave' is deprecated and will be removed in a future release. Please use rpl_semi_sync_replica instead. 2022-11-08T09:38:36.678413Z 0 [Warning] [MY-010918] [Repl] 'rpl_semi_sync_master' is deprecated and will be removed in a future release. Please use rpl_semi_sync_source instead. 2022-11-08T09:38:36.786279Z 0 [Warning] [MY-010068] [Server] CA certificate /etc/mysql/mysql-tls-secret/ca.crt is self signed. 2022-11-08T09:38:36.786334Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2022-11-08T09:38:36.798896Z 0 [Warning] [MY-013595] [Server] Failed to initialize TLS for channel: mysql_admin. See below for the description of exact issue. 2022-11-08T09:38:36.798950Z 0 [Warning] [MY-010069] [Server] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key 2022-11-08T09:38:36.798962Z 0 [System] [MY-013603] [Server] No TLS configuration was given for channel mysql_admin; re-using TLS configuration of channel mysql_main. 2022-11-08T09:38:36.842786Z 0 [Warning] [MY-010604] [Repl] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster1-mysql-1-relay-bin' to avoid this problem. 2022-11-08T09:38:36.855520Z 5 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2022-11-08T09:38:36.863876Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/lib/mysql/mysqlx.sock 2022-11-08T09:38:36.863941Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30-22' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 22, Revision 2072ebcdf97. 2022-11-08T09:38:36.863970Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '10.58.96.19' port: 33062 2022-11-08T09:38:36.872585Z 5 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error connecting to master 'replication@cluster1-mysql-2.cluster1-mysql.test:3306' - retry-time: 60 retries: 1 message: Unknown MySQL server host 'cluster1-mysql-2.cluster1-mysql.test' (-2), Error_code: MY-002005 2022-11-08T09:38:51.420378Z 22 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started 2022-11-08T09:38:51.811943Z 22 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished 2022-11-08T09:39:00.145096Z 22 [ERROR] [MY-013462] [Server] Clone shutting down server as RESTART failed. Please start server to complete clone operation. 2022-11-08T09:39:00.920231Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.30-22) Percona Server (GPL), Release 22, Revision 2072ebcdf97.
Done

Details

Assignee

Reporter

Needs QA

Yes

Fix versions

Affects versions

Priority

Smart Checklist

Created November 7, 2022 at 5:35 PM
Updated February 29, 2024 at 8:07 PM
Resolved January 4, 2023 at 10:01 AM

Flag notifications