Upgrade from Operator 1.3 to 1.4 is ending up with cluster without any replicas
Description
Environment
AFFECTED CS IDs
Activity
Slava Sarzhan July 19, 2023 at 1:02 PM
@Jobin Augustine I can confirm that we have a bug with a pause in 1.4.0, and we need to fix it. Please use https://docs.percona.com/percona-operator-for-postgresql/update.html new version of the official upgrade instruction, to update the operators. You will have an upgrade without any downtime at all.
Jobin Augustine July 18, 2023 at 7:46 AM
Hi @natalia.marukovich
Regarding,
> But why do you use this procedure? We have another steps in the doc https://docs.percona.com/percona-operator-for-postgresql/update.html for version 1.4.0
> Steps from doc work for me.
Really it works for you?. I am surprised . @Lalit Choudhary has already created another ticket for the poor documentation : https://jira.percona.com/browse/K8SPG-403
That official document contains only one of the step in the entire multi-step manual process/steps for "Upgrading the Operator"
There are pre and post steps required. Please see the comments by Lalit and Ege on the ticket https://perconadev.atlassian.net/browse/K8SPG-403#icft=K8SPG-403
The single step mentioned in the official documentation is part the steps I documented here also . please see the section : Edit the `operator.yaml` file of 1.4 to modify `DEPLOY_ACTION` to `update`
natalia.marukovich July 17, 2023 at 9:28 AMEdited
@Jobin Augustine
I'm rechecking your steps. But why do you use this procedure? We have another steps in the doc https://docs.percona.com/percona-operator-for-postgresql/update.html for version 1.4.0
Steps from doc work for me.
Lalit Choudhary July 17, 2023 at 6:04 AM
Workaround: After the upgrade, scale the replicas manually.
Example:
kubectl -n pgo scale --replicas=2 perconapgcluster/cluster1
Details
Assignee
UnassignedUnassignedReporter
Jobin AugustineJobin AugustineNeeds QA
YesComponents
Fix versions
Affects versions
Priority
Medium
Details
Details
Assignee
Reporter
Needs QA
Components
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

Independently tested and reproduce the customer case.
Existing cluster with PGO 1.3
$kubectl get pods NAME READY STATUS RESTARTS AGE postgres-operator-64b576cf76-4g54w 4/4 Running 0 16m pgo-deploy--1-9nh9f 0/1 Completed 0 17m cluster1-backrest-shared-repo-5c78d4fb49-9dnxs 1/1 Running 0 14m cluster1-pgbouncer-6b9fbb7954-btq2z 1/1 Running 0 12m cluster1-pgbouncer-6b9fbb7954-5tfdc 1/1 Running 0 12m cluster1-pgbouncer-6b9fbb7954-xzqq8 1/1 Running 0 12m backrest-backup-cluster1--1-2mps2 0/1 Completed 0 10m cluster1-694699d8f4-vmq7b 1/1 Running 0 14m cluster1-repl1-6b9677bcf7-2pwp9 1/1 Running 0 10m cluster1-repl2-55c6dd8f5-rjswq 1/1 Running 0 10m
Connect to primary and verify everything
$kubectl exec -it cluster1-694699d8f4-vmq7b – bash bash-4.4$ bash-4.4$ bash-4.4$ psql psql (14.4) Type "help" for help. postgres=# create table t1 (id int,nm varchar); CREATE TABLE postgres=# insert into t1 values (1,'Jobin'); INSERT 0 1
Please not the PostgreSQL version 14.4
Pause the cluster
$kubectl edit perconapgclusters.pg.percona.com cluster1 perconapgcluster.pg.percona.com/cluster1 edited
Verify that everything is paused
$kubectl get pods NAME READY STATUS RESTARTS AGE postgres-operator-64b576cf76-4g54w 4/4 Running 0 22m pgo-deploy--1-9nh9f 0/1 Completed 0 23m backrest-backup-cluster1--1-2mps2 0/1 Completed 0 16m
Delete following from the existing cluster
kubectl delete \ serviceaccounts/pgo-deployer-sa \ clusterroles/pgo-deployer-cr \ configmaps/pgo-deployer-cm \ configmaps/pgo-config \ clusterrolebindings/pgo-deployer-crb \ jobs.batch/pgo-deploy \ deployment/postgres-operator
Get the files for the Operator 1.4
cd ~/Downloads/ rm -rf percona-postgresql-operator git clone -b v1.4.0 https://github.com/percona/percona-postgresql-operator cd percona-postgresql-operator
Edit the `operator.yaml` file of 1.4 to modify `DEPLOY_ACTION` to `update`
vi deploy/operator.yaml
... env: - name: DEPLOY_ACTION value: update ...
Create the operator
$kubectl apply -f operator.yaml serviceaccount/pgo-deployer-sa created clusterrole.rbac.authorization.k8s.io/pgo-deployer-cr created configmap/pgo-deployer-cm created clusterrolebinding.rbac.authorization.k8s.io/pgo-deployer-crb created job.batch/pgo-deploy created
Wait for two mintues and verify that Operator is ready
sleep 120 kubectl get pods NAME READY STATUS RESTARTS AGE backrest-backup-cluster1--1-2mps2 0/1 Completed 0 27m postgres-operator-6658579bdd-jlm76 4/4 Running 0 2m22s pgo-deploy--1-qrbzn 0/1 Completed 0 2m59s
verify `cr.yaml` for compatible images. (PostgreSQL version) and apply the `cr.yaml`
$kubectl apply -f cr.yaml perconapgcluster.pg.percona.com/cluster1 configured
Wait for 3 minutes+ and check the status
$kubectl get pods NAME READY STATUS RESTARTS AGE backrest-backup-cluster1--1-2mps2 0/1 Completed 0 36m postgres-operator-6658579bdd-jlm76 4/4 Running 0 11m pgo-deploy--1-qrbzn 0/1 Completed 0 11m cluster1-backrest-shared-repo-75f4dd76d8-hs5jr 1/1 Running 0 7m4s cluster1-pgbouncer-6875db664f-sxn9c 1/1 Running 0 4m19s cluster1-pgbouncer-6875db664f-nr229 1/1 Running 0 4m19s cluster1-pgbouncer-6875db664f-6l9dw 1/1 Running 0 4m19s cluster1-backrest-shared-repo-75f4dd76d8-gdz82 1/1 Running 0 4m19s cluster1-backrest-shared-repo-75f4dd76d8-7jn4g 1/1 Running 0 4m19s cluster1-649b45fd6d-5nms5 1/1 Running 0 5m5s
We can see that there is no replica pods.