MongoDB Cluster cannot failover when down time all pods and using mode External (NodePort and LB)

Description

Description:

MongoDB Cluster cannot failover when down time all pods and using mode External (NodePort and LB)

Steps to Reproduce:

Kubectl delete (all pods in replicaset) --force -n namespace or shutdown all nodes in K8s cluster

Version:

Percona Operator for MongoDB* 1.15.0

Logs:

Logs in Operator:
2024-04-25T08:48:42.241Z ERROR failed to reconcile cluster {“controller”: “psmdb-controller”, “object”: {“name”:“mongo-psmdb-db”,“namespace”:“tungdt”}, “namespace”: “tungdt”, “name”: “mongo-psmdb-db”, “reconcileID”: “7614bf82-1bf1-43e0-b4ab-f07b6c5a358c”, “replset”: “rs0”, “error”: “dial: ping mongo: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongo-psmdb-db-rs0-0.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-0.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, { Addr: mongo-psmdb-db-rs0-1.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-1.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, { Addr: mongo-psmdb-db-rs0-2.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-2.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, ] }”, “errorVerbose”: “server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongo-psmdb-db-rs0-0.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-0.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, { Addr: mongo-psmdb-db-rs0-1.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-1.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, { Addr: mongo-psmdb-db-rs0-2.mongo-psmdb-db-rs0.tungdt.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup mongo-psmdb-db-rs0-2.mongo-psmdb-db-rs0.tungdt.svc.cluster.local on 10.43.0.10:53: no such host }, ] }\nping mongo\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.Dial\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:112\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb.MongoClient\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/client.go:62\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*mongoClientProvider).Mongo\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/connections.go:38\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).mongoClientWithRole\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/connections.go:60\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:87\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:498\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\ndial\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:93\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:498\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”} github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:500

Expected Result:

The cluster returns to normal operation and rs.status() displays information about the ready state of the cluster

Actual Result:

The cluster enters the RS Ghost state and becomes inoperable

Additional Information:

Environment

None

Activity

Show:

Yossi Chon May 17, 2024 at 11:35 AM

can this be merged to 1.15.x?

Slava Sarzhan May 17, 2024 at 8:30 AM

The issue was fixed. Thanks for fix.

Chung Trịnh Đức April 25, 2024 at 9:32 AM

Done

Details

Assignee

Reporter

Needs QA

Story Points

Sprint

Fix versions

Priority

Smart Checklist

Created April 25, 2024 at 9:28 AM
Updated September 9, 2024 at 1:31 PM
Resolved August 26, 2024 at 2:26 PM