cluster goes into unhealthy status after clustercheck secret changed

General

Escalation

General

Escalation

Description

It seems that this is affecting the master branch only for now.
Setup is 1x proxysql 3x pxc from master images.
After cluster is running patch the clustercheck secret something like:
kubectl patch secret my-cluster-secrets -p="{\"data\":{\"clustercheck\": \"Y2x1c3RlcjEyMzQ1\"}}"

Observe that the node-2 will be restarted, after some time node-0 and node-1 go into unready state.

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-proxysql-0                                3/3     Running   0          12m
cluster1-pxc-0                                     0/1     Running   0          11m
cluster1-pxc-1                                     0/1     Running   0          10m
cluster1-pxc-2                                     1/1     Running   0          7m43s
percona-xtradb-cluster-operator-6fc947d9bd-gsqld   1/1     Running   0          13m

Events:
  Type     Reason                  Age                   From                                                          Message
  ----     ------                  ----                  ----                                                          -------
  Normal   Scheduled               12m                   default-scheduler                                             Successfully assigned pxc-test/cluster1-pxc-0 to gke-tomislav-cluster-117-default-pool-ee53df37-vp08
  Normal   SuccessfulAttachVolume  12m                   attachdetach-controller                                       AttachVolume.Attach succeeded for volume "pvc-2f00de5c-3cea-4b5f-9f82-b819a85c09ee"
  Normal   Pulling                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Pulling image "perconalab/percona-xtradb-cluster-operator:1.6.0"
  Normal   Pulled                  11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Successfully pulled image "perconalab/percona-xtradb-cluster-operator:1.6.0"
  Normal   Created                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Created container pxc-init
  Normal   Started                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Started container pxc-init
  Normal   Pulling                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Pulling image "perconalab/percona-xtradb-cluster-operator:master-pxc8.0"
  Normal   Pulled                  11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Successfully pulled image "perconalab/percona-xtradb-cluster-operator:master-pxc8.0"
  Normal   Created                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Created container pxc
  Normal   Started                 11m                   kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Started container pxc
  Warning  Unhealthy               25s (x17 over 8m25s)  kubelet, gke-tomislav-cluster-117-default-pool-ee53df37-vp08  Readiness probe failed: ERROR 1045 (28000): Access denied for user 'clustercheck'@'localhost' (using password: YES)
+ [[ '' == \P\r\i\m\a\r\y ]]
+ exit 1

And the cluster seems to be stuck in this state.

If I manually delete the pods 0 and 1 they return to normal state.

Environment

None

Smart Checklist

Activity

Tomislav Plavcic November 2, 2020 at 1:12 PM

One thing that I might add here is that if I saw correctly that after node-2 is restarted init script is run, but when node-1 is restarted it isn't and that might be worth to check.

Done

Details
Assignee
Maksim Dudin(Deactivated)
Reporter
Tomislav Plavcic
Labels
bug-newdiscover-qa
Needs Review
Yes
Time tracking
6h logged
Fix versions
1.7.0
Priority
Medium
Parent
CLOUD-607 Operators custom resources statuses and progress bar

Smart Checklist

Created October 29, 2020 at 4:41 PM

Updated March 5, 2024 at 6:03 PM

Resolved February 2, 2021 at 3:49 PM

cluster goes into unhealthy status after clustercheck secret changed

Description

Environment

Smart Checklist

Activity

Tomislav Plavcic November 2, 2020 at 1:12 PM

Details
Assignee
Maksim Dudin(Deactivated)
Reporter
Tomislav Plavcic
Labels
bug-newdiscover-qa
Needs Review
Yes
Time tracking
6h logged
Fix versions
1.7.0
Priority
Medium
Parent
CLOUD-607 Operators custom resources statuses and progress bar

Details

Assignee

Reporter

Labels

Needs Review

Time tracking

Fix versions

Priority

Parent

Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

cluster goes into unhealthy status after clustercheck secret changed

Description

Environment

Smart Checklist

Activity

Tomislav Plavcic November 2, 2020 at 1:12 PM

DetailsAssigneeMaksim DudinMaksim Dudin(Deactivated)ReporterTomislav PlavcicTomislav PlavcicLabelsbug-newdiscover-qaNeeds ReviewYesTime tracking6h loggedFix versions1.7.0PriorityMediumParentCLOUD-607 Operators custom resources statuses and progress bar

Details

Assignee

Reporter

Labels

Needs Review

Time tracking

Fix versions

Priority

Parent

Smart ChecklistOpen Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

Details
Assignee
Maksim Dudin(Deactivated)
Reporter
Tomislav Plavcic
Labels
bug-newdiscover-qa
Needs Review
Yes
Time tracking
6h logged
Fix versions
1.7.0
Priority
Medium
Parent
CLOUD-607 Operators custom resources statuses and progress bar

Smart Checklist