mongos not restarted last on upgrade

Description

On minor/major upgrade in sharding cluster as decided before the order for upgrades/restarts should be cfg->rs0->mongos, but it seems that we are restarting mongos's as soon as cfg upgrade starts.
I tried with upgrade from 4.4.5 to 4.4.6 and on major upgrade from 4.0->4.2->4.4.

pod list

NAME READY STATUS RESTARTS AGE my-cluster-name-cfg-0 2/2 Running 0 7m54s my-cluster-name-cfg-1 2/2 Running 2 7m20s my-cluster-name-cfg-2 0/2 Init:0/1 0 4s my-cluster-name-mongos-79b479798b-jm6n9 1/1 Running 0 14s my-cluster-name-mongos-79b479798b-kcm26 0/1 Init:0/1 0 0s my-cluster-name-mongos-b54f59498-5s627 1/1 Running 0 7m50s my-cluster-name-mongos-b54f59498-cfcfn 1/1 Terminating 0 7m50s my-cluster-name-rs0-0 2/2 Running 0 7m53s my-cluster-name-rs0-1 2/2 Running 0 7m30s my-cluster-name-rs0-2 2/2 Running 0 6m58s percona-server-mongodb-operator-d859b69b6-nd5vk 1/1 Running 0 8m30s

operator log:

{"level":"info","ts":1624953721.9316542,"logger":"controller_psmdb","msg":"adding rs to shard","rs":"rs0"} {"level":"info","ts":1624953724.2957687,"logger":"controller_psmdb","msg":"added to shard","rs":"rs0"} {"level":"info","ts":1624953980.3187804,"logger":"controller_psmdb","msg":"add new job","name":"ensure-version/psmdb-test/my-cluster-name","schedule":"* * * * *"} {"level":"info","ts":1624954020.3514607,"logger":"controller_psmdb","msg":"update Mongo version from 4.4.5-7 to 4.4.6-8"} {"level":"info","ts":1624954023.9866426,"logger":"controller_psmdb","msg":"waiting for mongos update"} {"level":"info","ts":1624954024.0588324,"logger":"controller_psmdb","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-cfg"} {"level":"info","ts":1624954024.2644098,"logger":"controller_psmdb","msg":"balancer disabled"} {"level":"info","ts":1624954024.2998755,"logger":"controller_psmdb","msg":"primary pod is my-cluster-name-cfg-0.my-cluster-name-cfg.psmdb-test.svc.cluster.local:27017"} {"level":"info","ts":1624954024.2999463,"logger":"controller_psmdb","msg":"apply changes to secondary pod my-cluster-name-cfg-2"} {"level":"info","ts":1624954055.9410696,"logger":"controller_psmdb","msg":"pod my-cluster-name-cfg-2 started"} {"level":"info","ts":1624954055.9411113,"logger":"controller_psmdb","msg":"apply changes to secondary pod my-cluster-name-cfg-1"} {"level":"info","ts":1624954074.2855165,"logger":"controller_psmdb","msg":"pod my-cluster-name-cfg-1 started"} {"level":"info","ts":1624954074.2855618,"logger":"controller_psmdb","msg":"doing step down...","force":false} {"level":"info","ts":1624954074.2889597,"logger":"controller_psmdb","msg":"apply changes to primary pod my-cluster-name-cfg-0"} {"level":"info","ts":1624954094.7494006,"logger":"controller_psmdb","msg":"pod my-cluster-name-cfg-0 started"} {"level":"info","ts":1624954094.7494402,"logger":"controller_psmdb","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-cfg"} {"level":"info","ts":1624954094.8101466,"logger":"controller_psmdb","msg":"statefullSet was changed, start smart update","name":"my-cluster-name-rs0"} {"level":"info","ts":1624954094.8851542,"logger":"controller_psmdb","msg":"primary pod is my-cluster-name-rs0-0.my-cluster-name-rs0.psmdb-test.svc.cluster.local:27017"} {"level":"info","ts":1624954094.8851902,"logger":"controller_psmdb","msg":"apply changes to secondary pod my-cluster-name-rs0-2"} {"level":"info","ts":1624954127.4779322,"logger":"controller_psmdb","msg":"pod my-cluster-name-rs0-2 started"} {"level":"info","ts":1624954127.47801,"logger":"controller_psmdb","msg":"apply changes to secondary pod my-cluster-name-rs0-1"} {"level":"info","ts":1624954145.839198,"logger":"controller_psmdb","msg":"pod my-cluster-name-rs0-1 started"} {"level":"info","ts":1624954145.8392763,"logger":"controller_psmdb","msg":"doing step down...","force":false} {"level":"info","ts":1624954145.8428488,"logger":"controller_psmdb","msg":"apply changes to primary pod my-cluster-name-rs0-0"} {"level":"info","ts":1624954167.4697785,"logger":"controller_psmdb","msg":"pod my-cluster-name-rs0-0 started"} {"level":"info","ts":1624954167.4698262,"logger":"controller_psmdb","msg":"smart update finished for statefulset","statefulset":"my-cluster-name-rs0"} {"level":"info","ts":1624954167.6977565,"logger":"controller_psmdb","msg":"balancer enabled"}

Automatic test should be fixed also, because it was checking the upgrade order:
https://github.com/percona/percona-server-mongodb-operator/blob/main/e2e-tests/upgrade-sharded/run#L218-L222

Environment

None

Attachments

1
  • 29 Jun 2021, 11:21 AM

Smart Checklist

Activity

Tomislav Plavcic September 2, 2021 at 12:11 PM

I was not able to reproduce. Tried few more times, so either it's not an issue or it is very sporadic.
So I will now close the ticket and if the issue will come up again it can be re-evaluated.

Sergey Pronin August 31, 2021 at 1:18 PM

were you able to reproduce it ?

Tomislav Plavcic June 29, 2021 at 11:23 AM

Current status is that the issue seems sporadic or steps to reproduce are not certain so assigning to myself to investigate.
Possible steps:
start with:


add data with:

kubectl run -i --rm --tty percona-client --image=percona/percona-server-mongodb:4.4 --restart=Never -- mongo "mongodb://userAdmin:userAdmin123456@my-cluster-name-mongos.psmdb-test.svc.cluster.local/admin?ssl=false" --eval 'db.createUser({ user: "dba", pwd: "test1234", roles: [ "root" ] });' kubectl run -it --rm ycsb-client --image=plavi/test:ycsb --restart=Never -- load mongodb -s -P /ycsb/workloads/workloada -p recordcount=100000 -threads 8 -p mongodb.url="mongodb://dba:test1234@my-cluster-name-mongos.psmdb-test.svc.cluster.local/ycsb_test?ssl=false&authSource=admin&connectTimeoutMS=300000" -p mongodb.auth="true"

patch with:

kubectl patch psmdb my-cluster-name --type=merge --patch '{"spec": {"upgradeOptions":{ "apply": "recommended" }}}'

check upgrade order.

Cannot Reproduce

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created June 29, 2021 at 8:16 AM
Updated March 5, 2024 at 4:51 PM
Resolved September 2, 2021 at 12:13 PM

Flag notifications