Unexposing replicaset nodes breaks MongoDB cluster
General
Escalation
General
Escalation
Description
Environment
None
Activity
Aaditya Dubey December 25, 2024 at 12:18 PM
Hi @Roman Kosyk
We still haven't heard any news from you. So, I assume the issue no longer persists and will close the ticket. If you disagree, reply and create a follow-up with a new Jira report.
Aaditya Dubey September 6, 2024 at 2:04 PM
Hi @Roman Kosyk
Thank you for the report. Unfortunately, I can't repeat the behaviour. Please share the complete repeatable test case so we can further debug the issue.
[K8SPSMDB-1068]$ kubectl logs --tail=20 minimal-cluster-cfg-0
Defaulted container "mongod" out of: mongod, mongo-init (init)
{"t":{"$date":"2024-09-06T13:58:12.091+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3441","msg":"client metadata","attr":{"remote":"10.244.0.5:33514","client":"conn3441","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.091+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3442","msg":"client metadata","attr":{"remote":"10.244.0.5:33520","client":"conn3442","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.091+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.244.0.5:33534","uuid":"7ebdaa36-123b-43c7-b97b-925760c1f84e","connectionId":3443,"connectionCount":10}}
{"t":{"$date":"2024-09-06T13:58:12.092+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3443","msg":"client metadata","attr":{"remote":"10.244.0.5:33534","client":"conn3443","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.101+00:00"},"s":"I", "c":"ACCESS", "id":20250, "ctx":"conn3443","msg":"Authentication succeeded","attr":{"mechanism":"SCRAM-SHA-256","speculative":true,"principalName":"clusterAdmin","authenticationDatabase":"admin","remote":"10.244.0.5:33534","extraInfo":{}}}
{"t":{"$date":"2024-09-06T13:58:12.102+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.244.0.5:33550","uuid":"c43f1da0-d1ab-4903-83be-74f36920656d","connectionId":3444,"connectionCount":11}}
{"t":{"$date":"2024-09-06T13:58:12.103+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.244.0.5:33552","uuid":"e12e883e-01bb-4ec9-bccd-57c383f8d422","connectionId":3445,"connectionCount":12}}
{"t":{"$date":"2024-09-06T13:58:12.103+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3444","msg":"client metadata","attr":{"remote":"10.244.0.5:33550","client":"conn3444","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.103+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3445","msg":"client metadata","attr":{"remote":"10.244.0.5:33552","client":"conn3445","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.104+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.244.0.5:33556","uuid":"136db7d4-ec20-4bf9-bec8-da3c7a8a28bd","connectionId":3446,"connectionCount":13}}
{"t":{"$date":"2024-09-06T13:58:12.104+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn3446","msg":"client metadata","attr":{"remote":"10.244.0.5:33556","client":"conn3446","doc":{"driver":{"name":"mongo-go-driver","version":"v1.12.1"},"os":{"type":"linux","architecture":"amd64"},"platform":"go1.20.9"}}}
{"t":{"$date":"2024-09-06T13:58:12.112+00:00"},"s":"I", "c":"ACCESS", "id":20250, "ctx":"conn3446","msg":"Authentication succeeded","attr":{"mechanism":"SCRAM-SHA-256","speculative":true,"principalName":"userAdmin","authenticationDatabase":"admin","remote":"10.244.0.5:33556","extraInfo":{}}}
{"t":{"$date":"2024-09-06T13:58:12.117+00:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn3445","msg":"Interrupted operation as its client disconnected","attr":{"opId":54411}}
{"t":{"$date":"2024-09-06T13:58:12.117+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3444","msg":"Connection ended","attr":{"remote":"10.244.0.5:33550","uuid":"c43f1da0-d1ab-4903-83be-74f36920656d","connectionId":3444,"connectionCount":12}}
{"t":{"$date":"2024-09-06T13:58:12.117+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3446","msg":"Connection ended","attr":{"remote":"10.244.0.5:33556","uuid":"136db7d4-ec20-4bf9-bec8-da3c7a8a28bd","connectionId":3446,"connectionCount":11}}
{"t":{"$date":"2024-09-06T13:58:12.117+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3445","msg":"Connection ended","attr":{"remote":"10.244.0.5:33552","uuid":"e12e883e-01bb-4ec9-bccd-57c383f8d422","connectionId":3445,"connectionCount":10}}
{"t":{"$date":"2024-09-06T13:58:12.118+00:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn3441","msg":"Interrupted operation as its client disconnected","attr":{"opId":54405}}
{"t":{"$date":"2024-09-06T13:58:12.118+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3442","msg":"Connection ended","attr":{"remote":"10.244.0.5:33520","uuid":"d6ab8d2c-c12a-45e2-8f72-602c8544482f","connectionId":3442,"connectionCount":9}}
{"t":{"$date":"2024-09-06T13:58:12.118+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3443","msg":"Connection ended","attr":{"remote":"10.244.0.5:33534","uuid":"7ebdaa36-123b-43c7-b97b-925760c1f84e","connectionId":3443,"connectionCount":8}}
{"t":{"$date":"2024-09-06T13:58:12.118+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3441","msg":"Connection ended","attr":{"remote":"10.244.0.5:33514","uuid":"88608311-59d3-4a6d-8253-5882f94e50c6","connectionId":3441,"connectionCount":7}}
Roman Kosyk April 17, 2024 at 12:44 PM
I was not able to attach a screenshot in the description, so I am leaving it here:
https://snipboard.io/SHzkLa.jpg
My operator version: 1.15.0
Kubernetes version: 1.25.16
For example, is given minimal cluster with
exposed
replicaset nodes andclusterServiceDNSMode = External
set:apiVersion: psmdb.percona.com/v1 kind: PerconaServerMongoDB metadata: name: minimal-cluster spec: crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 allowUnsafeConfigurations: true clusterServiceDNSMode: External replsets: - name: rs0 size: 1 volumeSpec: persistentVolumeClaim: resources: requests: storage: 3Gi expose: enabled: true exposeType: ClusterIP sharding: enabled: true configsvrReplSet: size: 1 volumeSpec: persistentVolumeClaim: resources: requests: storage: 3Gi expose: enabled: true exposeType: ClusterIP mongos: size: 1
Changing replicasets expose.enabled from true to false breaks the cluster
After this,
In ConfigServer log records you can see those error messages:
{"t":{"$date":"2024-04-17T12:18:00.104+00:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"rs0","host":"172.20.193.33:27017","error":{"cod e":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit"},"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"172.20.193.33:27017","success":fa lse,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit"}}}}
For some reason it still tries to connect to
"host":"172.20.193.33:27017"
- this is IP address of host when it was exposed, but now it is not exposed, IP address doesn’t exist, and connection fails.One of the possible reasons for this is that hosts are not updated after unexposing nodes in ConfigServer -config database - shards collections, you can see this on the screenshot below: