Restore of the cluster results in replica stuck in CrashLoopBackOff
General
Escalation
General
Escalation
Description
Environment
None
Attachments
1
- 07 Nov 2022, 05:34 PM
Activity
Show:
Slava Sarzhan December 7, 2022 at 9:33 AM
@Fernando Laudares Carmagos thanks for the report. The issue was fixed and will be available in the next release.
Tomislav Plavcic November 8, 2022 at 9:47 AM
More info from slack:
Clearly, the culprit there is the Orchestrator: it is not being able to assign a new primary, at least not when the primary at the moment the backup was taken was cluster1-mysql-2 (the 3rd instance.
During the restore process, cluster1-mysql-0 would come up first but not as a primary; it would still have replication pointed to cluster1-mysql-2 and read_only enabled, but at least it would come up. The second mysql instance, cluster1-mysql-1, would always get stuck with the error described in the bug above; and since replication would fail to start, mysql would be killed (by Orchestrator, I suppose) and then the pod would enter in CrashLoopBackOff and remain there, with cluster1-mysql-2 not having a chance to start.
This should be very easy to reproduce with a simple restore process where the primary at the time of the backup was cluster1-mysql-2.
I could reproduce the issue as well:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cluster1-haproxy-0 2/2 Running 0 8m3s
cluster1-haproxy-1 2/2 Running 0 7m58s
cluster1-haproxy-2 2/2 Running 0 7m55s
cluster1-mysql-0 3/3 Running 0 9m28s
cluster1-mysql-1 2/3 CrashLoopBackOff 12 (21s ago) 8m48s
cluster1-orc-0 2/2 Running 0 9m28s
cluster1-orc-1 2/2 Running 0 8m52s
cluster1-orc-2 2/2 Running 0 8m17s
percona-server-mysql-operator-57cccddf49-x5dj6 1/1 Running 0 38m
xb-backup1-s3-us-west-kkl4c 0/1 Completed 0 17m
xb-restore-restore1-9fgpw 0/1 Completed 0 10m
$ kubectl get ps-restore
NAME STATE AGE
restore1 Succeeded 12m
+ exec mysqld
2022-11-08T09:38:36.057427Z 0 [Warning] [MY-011068] [Server] The syntax '--skip-host-cache' is deprecated and will be removed in a future release. Please use SET GLOBAL host_cache_size=0 instead.
2022-11-08T09:38:36.060443Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.
2022-11-08T09:38:36.060580Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30-22) starting as process 1
2022-11-08T09:38:36.070024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2022-11-08T09:38:36.512324Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2022-11-08T09:38:36.678311Z 0 [Warning] [MY-010918] [Repl] 'rpl_semi_sync_slave' is deprecated and will be removed in a future release. Please use rpl_semi_sync_replica instead.
2022-11-08T09:38:36.678413Z 0 [Warning] [MY-010918] [Repl] 'rpl_semi_sync_master' is deprecated and will be removed in a future release. Please use rpl_semi_sync_source instead.
2022-11-08T09:38:36.786279Z 0 [Warning] [MY-010068] [Server] CA certificate /etc/mysql/mysql-tls-secret/ca.crt is self signed.
2022-11-08T09:38:36.786334Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2022-11-08T09:38:36.798896Z 0 [Warning] [MY-013595] [Server] Failed to initialize TLS for channel: mysql_admin. See below for the description of exact issue.
2022-11-08T09:38:36.798950Z 0 [Warning] [MY-010069] [Server] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
2022-11-08T09:38:36.798962Z 0 [System] [MY-013603] [Server] No TLS configuration was given for channel mysql_admin; re-using TLS configuration of channel mysql_main.
2022-11-08T09:38:36.842786Z 0 [Warning] [MY-010604] [Repl] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster1-mysql-1-relay-bin' to avoid this problem.
2022-11-08T09:38:36.855520Z 5 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2022-11-08T09:38:36.863876Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/lib/mysql/mysqlx.sock
2022-11-08T09:38:36.863941Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30-22' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 22, Revision 2072ebcdf97.
2022-11-08T09:38:36.863970Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '10.58.96.19' port: 33062
2022-11-08T09:38:36.872585Z 5 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error connecting to master 'replication@cluster1-mysql-2.cluster1-mysql.test:3306' - retry-time: 60 retries: 1 message: Unknown MySQL server host 'cluster1-mysql-2.cluster1-mysql.test' (-2), Error_code: MY-002005
2022-11-08T09:38:51.420378Z 22 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
2022-11-08T09:38:51.811943Z 22 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished
2022-11-08T09:39:00.145096Z 22 [ERROR] [MY-013462] [Server] Clone shutting down server as RESTART failed. Please start server to complete clone operation.
2022-11-08T09:39:00.920231Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.30-22) Percona Server (GPL), Release 22, Revision 2072ebcdf97.
Done
Created November 7, 2022 at 5:35 PM
Updated February 29, 2024 at 8:07 PM
Resolved January 4, 2023 at 10:01 AM
From following https://docs.percona.com/percona-operator-for-mysql/ps/backups.html :
Log shows:
$ kubectl logs cluster1-mysql-1 (...) 2022-11-07T17:22:09.453062Z 5 [ERROR] [MY-010584] [Repl] Slave I/O for channel '': error connecting to master 'replication@cluster1-mysql-2.cluster1-mysql.default:3306' - retry-time: 60 retries: 1 message: Unknown MySQL server host 'cluster1-mysql-2.cluster1-mysql.default' (-2), Error_code: MY-002005
where cluster1-mysql-2 was the old primary ...