Duplicate
Details
Assignee
Slava SarzhanSlava SarzhanReporter
Vadim TkachenkoVadim TkachenkoLabels
Time tracking
30m loggedPriority
Medium
Details
Details
Assignee
Slava Sarzhan
Slava SarzhanReporter
Vadim Tkachenko
Vadim TkachenkoLabels
Time tracking
30m logged
Priority
Smart Checklist
Smart Checklist
Smart Checklist
Created October 7, 2020 at 4:38 PM
Updated March 5, 2024 at 6:05 PM
Resolved December 18, 2020 at 3:13 PM
After a specific failure (see log)
The pod becomes unvailable, this is from kubectl get pods:
kubectl get pods -n pxc -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cl2-haproxy-0 2/2 Running 3 84m 192.168.61.133 beast-node7-ubuntu <none> <none> cl2-haproxy-1 2/2 Running 3 82m 192.168.71.196 beast-node8-ubuntu <none> <none> cl2-haproxy-2 2/2 Running 3 82m 192.168.66.6 beast-node6-ubuntu <none> <none> cl2-pxc-0 1/1 Running 0 79m 192.168.66.7 beast-node6-ubuntu <none> <none> cl2-pxc-1 0/1 Running 0 83m 192.168.61.134 beast-node7-ubuntu <none> <none> cl2-pxc-2 1/1 Running 3 81m 192.168.71.197 beast-node8-ubuntu <none> <none>
We can see that pod cl2-pxc-1 is not ready but it is not being restarted
kubectl describe:
k describe po/cl2-pxc-1 Name: cl2-pxc-1 Namespace: pxc Priority: 0 Node: beast-node7-ubuntu/172.16.0.15 Start Time: Wed, 07 Oct 2020 11:14:08 -0400 Labels: app.kubernetes.io/component=pxc app.kubernetes.io/instance=cl2 app.kubernetes.io/managed-by=percona-xtradb-cluster-operator app.kubernetes.io/name=percona-xtradb-cluster app.kubernetes.io/part-of=percona-xtradb-cluster controller-revision-hash=cl2-pxc-69cfd8579f statefulset.kubernetes.io/pod-name=cl2-pxc-1 Annotations: cni.projectcalico.org/podIP: 192.168.61.134/32 percona.com/configuration-hash: d41d8cd98f00b204e9800998ecf8427e percona.com/ssl-hash: ee931e5aedf277184d31dcce4214d637 percona.com/ssl-internal-hash: c9712c413646a7be0b49b92f55996b97 Status: Running IP: 192.168.61.134 IPs: IP: 192.168.61.134 Controlled By: StatefulSet/cl2-pxc Init Containers: pxc-init: Container ID: docker://e135793ff2fae5a0e8c796a2501c2fe0ad0ef9dcfad4e0deedf38e599700c1a7 Image: percona/percona-xtradb-cluster-operator:1.6.0 Image ID: docker-pullable://percona/percona-xtradb-cluster-operator@sha256:4ce6c8a55d8ed3a60c96c406ee103f70d303ebd97237e53d0e38fde75f848683 Port: <none> Host Port: <none> Command: /pxc-init-entrypoint.sh State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 07 Oct 2020 11:14:10 -0400 Finished: Wed, 07 Oct 2020 11:14:10 -0400 Ready: True Restart Count: 0 Requests: cpu: 600m memory: 1G Environment: <none> Mounts: /var/lib/mysql from datadir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-l7m9h (ro) pxc-init-unsafe: Container ID: docker://dadb5bb3f034a6b9c081d128e8c52aa9591d42b88ec4d14ff3a3b9e4fdaadda3 Image: percona/percona-xtradb-cluster:8.0.20-11.1 Image ID: docker-pullable://percona/percona-xtradb-cluster@sha256:54b1b2f5153b78b05d651034d4603a13e685cbb9b45bfa09a39864fa3f169349 Ports: 3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP, 33062/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Command: /var/lib/mysql/unsafe-bootstrap.sh Args: mysqld State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 07 Oct 2020 11:14:11 -0400 Finished: Wed, 07 Oct 2020 11:14:11 -0400 Ready: True Restart Count: 0 Requests: cpu: 600m memory: 1G Environment: PXC_SERVICE: cl2-pxc-unready MONITOR_HOST: % MYSQL_ROOT_PASSWORD: <set to the key 'root' in secret 'internal-cl2'> Optional: false XTRABACKUP_PASSWORD: <set to the key 'xtrabackup' in secret 'internal-cl2'> Optional: false MONITOR_PASSWORD: <set to the key 'monitor' in secret 'internal-cl2'> Optional: false CLUSTERCHECK_PASSWORD: <set to the key 'clustercheck' in secret 'internal-cl2'> Optional: false OPERATOR_ADMIN_PASSWORD: <set to the key 'operator' in secret 'internal-cl2'> Optional: false Mounts: /etc/my.cnf.d from auto-config (rw) /etc/mysql/ssl from ssl (rw) /etc/mysql/ssl-internal from ssl-internal (rw) /etc/mysql/vault-keyring-secret from vault-keyring-secret (rw) /etc/percona-xtradb-cluster.conf.d from config (rw) /tmp from tmp (rw) /var/lib/mysql from datadir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-l7m9h (ro) Containers: pxc: Container ID: docker://58f0ae31d9308214cad4934fd7b66acbfa916f377e9ad4b3a5a748eed794306c Image: percona/percona-xtradb-cluster:8.0.20-11.1 Image ID: docker-pullable://percona/percona-xtradb-cluster@sha256:54b1b2f5153b78b05d651034d4603a13e685cbb9b45bfa09a39864fa3f169349 Ports: 3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP, 33062/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Command: /var/lib/mysql/pxc-entrypoint.sh Args: mysqld State: Running Started: Wed, 07 Oct 2020 11:14:12 -0400 Ready: False Restart Count: 0 Requests: cpu: 600m memory: 1G Liveness: exec [/var/lib/mysql/liveness-check.sh] delay=300s timeout=5s period=10s #success=1 #failure=3 Readiness: exec [/var/lib/mysql/readiness-check.sh] delay=15s timeout=15s period=30s #success=1 #failure=5 Environment: PXC_SERVICE: cl2-pxc-unready MONITOR_HOST: % MYSQL_ROOT_PASSWORD: <set to the key 'root' in secret 'internal-cl2'> Optional: false XTRABACKUP_PASSWORD: <set to the key 'xtrabackup' in secret 'internal-cl2'> Optional: false MONITOR_PASSWORD: <set to the key 'monitor' in secret 'internal-cl2'> Optional: false CLUSTERCHECK_PASSWORD: <set to the key 'clustercheck' in secret 'internal-cl2'> Optional: false OPERATOR_ADMIN_PASSWORD: <set to the key 'operator' in secret 'internal-cl2'> Optional: false Mounts: /etc/my.cnf.d from auto-config (rw) /etc/mysql/ssl from ssl (rw) /etc/mysql/ssl-internal from ssl-internal (rw) /etc/mysql/vault-keyring-secret from vault-keyring-secret (rw) /etc/percona-xtradb-cluster.conf.d from config (rw) /tmp from tmp (rw) /var/lib/mysql from datadir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-l7m9h (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: datadir: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datadir-cl2-pxc-1 ReadOnly: false tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> config: Type: ConfigMap (a volume populated by a ConfigMap) Name: cl2-pxc Optional: true ssl-internal: Type: Secret (a volume populated by a Secret) SecretName: my-cluster-ssl-internal Optional: true ssl: Type: Secret (a volume populated by a Secret) SecretName: my-cluster-ssl Optional: false auto-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: auto-cl2-pxc Optional: true vault-keyring-secret: Type: Secret (a volume populated by a Secret) SecretName: keyring-secret-vault Optional: true default-token-l7m9h: Type: Secret (a volume populated by a Secret) SecretName: default-token-l7m9h Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 53m kubelet, beast-node7-ubuntu Readiness probe failed: + [[ Primary == \P\r\i\m\a\r\y ]] + [[ 5 -eq 4 ]] + [[ 5 -eq 2 ]] + exit 1 Warning Unhealthy 110s (x103 over 52m) kubelet, beast-node7-ubuntu Readiness probe failed: + [[ Disconnected == \P\r\i\m\a\r\y ]] + exit 1
log of node failure:
2020-10-07T15:18:14.093934Z 0 [Note] [MY-000000] [Galera] async IST sender served 2020-10-07T15:18:14.098880Z 0 [Note] [MY-000000] [Galera] 1.0 (cl2-pxc-0): State transfer from 2.0 (cl2-pxc-1) complete. 2020-10-07T15:18:14.099847Z 0 [Note] [MY-000000] [Galera] Member 1.0 (cl2-pxc-0) synced with group. 2020-10-07T15:42:51.252716Z 0 [Note] [MY-000000] [Galera] declaring 4d930040 at ssl://192.168.66.7:4567 stable 2020-10-07T15:42:51.252813Z 0 [Note] [MY-000000] [Galera] forgetting 0161f386 (ssl://192.168.71.197:4567) 2020-10-07T15:42:51.253609Z 0 [Note] [MY-000000] [Galera] Node 4d930040 state primary 2020-10-07T15:42:51.257245Z 2 [ERROR] [MY-010584] [Repl] Slave SQL: Could not execute Update_rows event on table sbtest.warehouse9; Can't find record in 'warehouse9', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 0, Error_code: MY-001032 2020-10-07T15:42:51.257277Z 2 [Warning] [MY-000000] [WSREP] Event 3 Update_rows apply failed: 120, seqno 76599 2020-10-07T15:42:51.257968Z 2 [Note] [MY-000000] [Galera] Failed to apply write set: gtid: 9eb2648f-08af-11eb-ab37-7fa7bf32cdd9:76599 server_id: 4d930040-08b0-11eb-b309-5af603144c68 client_id: 2349 trx_id: 86802 flags: 3 2020-10-07T15:42:51.258561Z 2 [Note] [MY-000000] [Galera] Closing send monitor... 2020-10-07T15:42:51.258580Z 2 [Note] [MY-000000] [Galera] Closed send monitor. 2020-10-07T15:42:51.258594Z 2 [Note] [MY-000000] [Galera] gcomm: terminating thread 2020-10-07T15:42:51.258611Z 2 [Note] [MY-000000] [Galera] gcomm: joining thread 2020-10-07T15:42:51.258724Z 2 [Note] [MY-000000] [Galera] gcomm: closing backend 2020-10-07T15:42:51.290386Z 10 [ERROR] [MY-010584] [Repl] Slave SQL: Could not execute Update_rows event on table sbtest.warehouse5; Can't find record in 'warehouse5', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 0, Error_code: MY-001032 2020-10-07T15:42:51.290412Z 10 [Warning] [MY-000000] [WSREP] Event 3 Update_rows apply failed: 120, seqno 76600 2020-10-07T15:42:51.290898Z 10 [Note] [MY-000000] [Galera] Failed to apply write set: gtid: 9eb2648f-08af-11eb-ab37-7fa7bf32cdd9:76600 server_id: 4d930040-08b0-11eb-b309-5af603144c68 client_id: 2388 trx_id: 86791 flags: 3 2020-10-07T15:42:51.324996Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(PRIM,4d930040,6) memb { 4d930040,0 c2b193c8,0 } joined { } left { } partitioned { 0161f386,0 } ) 2020-10-07T15:42:51.325043Z 2 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk 2020-10-07T15:42:51.325822Z 2 [Note] [MY-000000] [Galera] forgetting 0161f386 (ssl://192.168.71.197:4567) 2020-10-07T15:42:52.327208Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(NON_PRIM,4d930040,6) memb { c2b193c8,0 } joined { } left { } partitioned { 4d930040,0 } ) 2020-10-07T15:42:52.327236Z 2 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0 2020-10-07T15:42:52.327260Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view ((empty)) 2020-10-07T15:42:52.327430Z 2 [Note] [MY-000000] [Galera] gcomm: closed 2020-10-07T15:42:52.327552Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2020-10-07T15:42:52.327630Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID. 2020-10-07T15:42:52.327723Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2020-10-07T15:42:52.327821Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100] 2020-10-07T15:42:52.327847Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY. 2020-10-07T15:42:52.327866Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 76659) 2020-10-07T15:42:52.327902Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE. 2020-10-07T15:42:52.327952Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0] 2020-10-07T15:42:52.327980Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection. 2020-10-07T15:42:52.328003Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: -1) 2020-10-07T15:42:52.328025Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success 2020-10-07T15:42:52.328048Z 10 [Note] [MY-000000] [Galera] ####### processing CC -1, local, ordered 2020-10-07T15:42:52.328114Z 10 [Note] [MY-000000] [Galera] ####### My UUID: c2b193c8-08af-11eb-b3f1-6a3d4c9d4080 2020-10-07T15:42:52.328155Z 2 [Note] [MY-000000] [Galera] recv_thread() joined. 2020-10-07T15:42:52.328156Z 10 [Note] [MY-000000] [Galera] ####### ST not required 2020-10-07T15:42:52.328165Z 2 [Note] [MY-000000] [Galera] Closing replication queue. 2020-10-07T15:42:52.328187Z 2 [Note] [MY-000000] [Galera] Closing slave action queue. 2020-10-07T15:42:52.328242Z 10 [Note] [MY-000000] [Galera] ================================================ View: id: 9eb2648f-08af-11eb-ab37-7fa7bf32cdd9:-1 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: no own_index: 0 members(1): 0: c2b193c8-08af-11eb-b3f1-6a3d4c9d4080, cl2-pxc-1 ================================================= 2020-10-07T15:42:52.328271Z 10 [Note] [MY-000000] [Galera] Non-primary view 2020-10-07T15:42:52.328291Z 10 [Note] [MY-000000] [WSREP] Server status change synced -> connected 2020-10-07T15:42:52.329196Z 10 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2020-10-07T15:42:52.331451Z 10 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2020-10-07T15:42:52.331537Z 10 [Note] [MY-000000] [Galera] ####### processing CC -1, local, ordered 2020-10-07T15:42:52.331573Z 10 [Note] [MY-000000] [Galera] ####### My UUID: c2b193c8-08af-11eb-b3f1-6a3d4c9d4080 2020-10-07T15:42:52.331595Z 10 [Note] [MY-000000] [Galera] ####### ST not required 2020-10-07T15:42:52.331632Z 10 [Note] [MY-000000] [Galera] ================================================ View: id: 9eb2648f-08af-11eb-ab37-7fa7bf32cdd9:-1 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: yes own_index: -1 members(0): ================================================= 2020-10-07T15:42:52.331658Z 10 [Note] [MY-000000] [Galera] Non-primary view 2020-10-07T15:42:52.331677Z 10 [Note] [MY-000000] [WSREP] Server status change connected -> disconnected 2020-10-07T15:42:52.331699Z 10 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2020-10-07T15:42:52.331721Z 10 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2020-10-07T15:42:52.338267Z 2 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 6 thd: 2 2020-10-07T15:42:52.342107Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2020-10-07T15:42:52.342187Z 10 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5 2020-10-07T15:42:52.342234Z 10 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 10 2020-10-07T15:43:59.501050Z 1579 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known 2020-10-07T15:44:08.665507Z 1586 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known 2020-10-07T15:44:18.710482Z 1595 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known 2020-10-07T15:44:22.049759Z 1599 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known 2020-10-07T15:44:25.430895Z 1602 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known
2020-10-07T15:44:28.761998Z 1605 [Warning] [MY-010056] [Server] Host name '192-168-71-196.cl2-haproxy-replicas.pxc.svc.cluster.local' could not be resolved: Name or service not known