Upgrade from Operator 1.3 to 1.4 is ending up with cluster without any replicas

Description

Independently tested and reproduce the customer case.

 

Existing cluster with PGO 1.3

$kubectl get pods NAME READY STATUS RESTARTS AGE postgres-operator-64b576cf76-4g54w 4/4 Running 0 16m pgo-deploy--1-9nh9f 0/1 Completed 0 17m cluster1-backrest-shared-repo-5c78d4fb49-9dnxs 1/1 Running 0 14m cluster1-pgbouncer-6b9fbb7954-btq2z 1/1 Running 0 12m cluster1-pgbouncer-6b9fbb7954-5tfdc 1/1 Running 0 12m cluster1-pgbouncer-6b9fbb7954-xzqq8 1/1 Running 0 12m backrest-backup-cluster1--1-2mps2 0/1 Completed 0 10m cluster1-694699d8f4-vmq7b 1/1 Running 0 14m cluster1-repl1-6b9677bcf7-2pwp9 1/1 Running 0 10m cluster1-repl2-55c6dd8f5-rjswq 1/1 Running 0 10m

 

Connect to primary and verify everything

$kubectl exec -it cluster1-694699d8f4-vmq7b – bash bash-4.4$ bash-4.4$ bash-4.4$ psql psql (14.4) Type "help" for help. postgres=# create table t1 (id int,nm varchar); CREATE TABLE postgres=# insert into t1 values (1,'Jobin'); INSERT 0 1

 

  •  

    • Please not the PostgreSQL version 14.4

Pause the cluster

$kubectl edit perconapgclusters.pg.percona.com cluster1 perconapgcluster.pg.percona.com/cluster1 edited

Verify that everything is paused

$kubectl get pods NAME READY STATUS RESTARTS AGE postgres-operator-64b576cf76-4g54w 4/4 Running 0 22m pgo-deploy--1-9nh9f 0/1 Completed 0 23m backrest-backup-cluster1--1-2mps2 0/1 Completed 0 16m

 

 

Delete following from the existing cluster

kubectl delete \ serviceaccounts/pgo-deployer-sa \ clusterroles/pgo-deployer-cr \ configmaps/pgo-deployer-cm \ configmaps/pgo-config \ clusterrolebindings/pgo-deployer-crb \ jobs.batch/pgo-deploy \ deployment/postgres-operator

 

 

Get the files for the Operator 1.4

cd ~/Downloads/ rm -rf percona-postgresql-operator git clone -b v1.4.0 https://github.com/percona/percona-postgresql-operator cd percona-postgresql-operator

 

 

Edit the `operator.yaml` file of 1.4 to modify `DEPLOY_ACTION` to `update`

vi deploy/operator.yaml

 

... env: - name: DEPLOY_ACTION value: update ...

Create the operator

$kubectl apply -f operator.yaml serviceaccount/pgo-deployer-sa created clusterrole.rbac.authorization.k8s.io/pgo-deployer-cr created configmap/pgo-deployer-cm created clusterrolebinding.rbac.authorization.k8s.io/pgo-deployer-crb created job.batch/pgo-deploy created

Wait for two mintues and verify that Operator is ready

sleep 120 kubectl get pods NAME READY STATUS RESTARTS AGE backrest-backup-cluster1--1-2mps2 0/1 Completed 0 27m postgres-operator-6658579bdd-jlm76 4/4 Running 0 2m22s pgo-deploy--1-qrbzn 0/1 Completed 0 2m59s

 

 

verify `cr.yaml` for compatible images. (PostgreSQL version) and apply the `cr.yaml`

$kubectl apply -f cr.yaml perconapgcluster.pg.percona.com/cluster1 configured

 

 

Wait for 3 minutes+ and check the status

$kubectl get pods NAME READY STATUS RESTARTS AGE backrest-backup-cluster1--1-2mps2 0/1 Completed 0 36m postgres-operator-6658579bdd-jlm76 4/4 Running 0 11m pgo-deploy--1-qrbzn 0/1 Completed 0 11m cluster1-backrest-shared-repo-75f4dd76d8-hs5jr 1/1 Running 0 7m4s cluster1-pgbouncer-6875db664f-sxn9c 1/1 Running 0 4m19s cluster1-pgbouncer-6875db664f-nr229 1/1 Running 0 4m19s cluster1-pgbouncer-6875db664f-6l9dw 1/1 Running 0 4m19s cluster1-backrest-shared-repo-75f4dd76d8-gdz82 1/1 Running 0 4m19s cluster1-backrest-shared-repo-75f4dd76d8-7jn4g 1/1 Running 0 4m19s cluster1-649b45fd6d-5nms5 1/1 Running 0 5m5s

We can see that there is no replica pods.

 

Environment

None

AFFECTED CS IDs

CS0037817

Activity

Show:

Slava Sarzhan July 19, 2023 at 1:02 PM

I can confirm that we have a bug with a pause in 1.4.0, and we need to fix it. Please use https://docs.percona.com/percona-operator-for-postgresql/update.html new version of the official upgrade instruction, to update the operators. You will have an upgrade without any downtime at all.

Jobin Augustine July 18, 2023 at 7:46 AM

Hi  
Regarding,
> But why do you use this procedure? We have another steps in the doc https://docs.percona.com/percona-operator-for-postgresql/update.html for version 1.4.0
> Steps from doc work for me. 

Really it works for you?. I am surprised .    has already created another ticket for the poor documentation : https://jira.percona.com/browse/K8SPG-403
That official document contains only one of the step in the entire multi-step manual process/steps for "Upgrading the Operator"
There are pre and post steps required. Please see the comments by Lalit and Ege on the ticket https://perconadev.atlassian.net/browse/K8SPG-403#icft=K8SPG-403
The single step mentioned in the official documentation is part the steps I documented here also . please see the section  : Edit the `operator.yaml` file of 1.4 to modify `DEPLOY_ACTION` to `update`

natalia.marukovich July 17, 2023 at 9:28 AM
Edited


I'm rechecking your steps. But why do you use this procedure? We have another steps in the doc https://docs.percona.com/percona-operator-for-postgresql/update.html for version 1.4.0
Steps from doc work for me.

Lalit Choudhary July 17, 2023 at 6:04 AM

Workaround: After the upgrade, scale the replicas manually.

Example:

kubectl -n pgo scale --replicas=2 perconapgcluster/cluster1
Done

Details

Assignee

Reporter

Needs QA

Yes

Components

Fix versions

Affects versions

Priority

Smart Checklist

Created July 14, 2023 at 4:59 AM
Updated March 8, 2024 at 2:05 PM
Resolved August 31, 2023 at 7:45 AM