MongoDB is transiently unavailable although the mongos servers have become ready when the cluster is first booting up
General
Escalation
General
Escalation
Description
MongoDB is transiently unavailable for writes even when the mongos servers are showing Ready status when the cluster is first booting up.
The MongoDB does become available after the mongos becomes ready for a while. We think that the MongoDB cluster is not yet fully ready although the current readiness probe for mongos succeeds.
To Reproduce Steps to reproduce the behavior:
Deploy any MongoDB cluster with sharding:
Deploy a MongoDB client which keeps trying to perform a write. In our case, we are using the mongodb's golang client to perform a collection.InsertOne() with a document.
Expected behavior After the Mongos servers get ready, the cluster should become fully ready for client workloads.
Current behavior Although the Mongos servers' readiness probes succeeded, the the InsertOne workloads are failing, with errors saying shard not found
Root Cause The current readiness probe implementation for Mongos is listing the admin db and check if the request can be successfully executed. However, although the admin db can be listed, the MongoDB cluster is not yet ready for write workload, causing unavailability.
MongoDB is transiently unavailable for writes even when the mongos servers are showing Ready status when the cluster is first booting up.
The MongoDB does become available after the mongos becomes ready for a while. We think that the MongoDB cluster is not yet fully ready although the current readiness probe for mongos succeeds.
To Reproduce
Steps to reproduce the behavior:
Deploy any MongoDB cluster with sharding:
Deploy a MongoDB client which keeps trying to perform a write. In our case, we are using the mongodb's golang client to perform a collection.InsertOne() with a document.
Expected behavior
After the Mongos servers get ready, the cluster should become fully ready for client workloads.
Current behavior
Although the Mongos servers' readiness probes succeeded, the the InsertOne workloads are failing, with errors saying
shard not found
Root Cause
The current readiness probe implementation for Mongos is listing the admin db and check if the request can be successfully executed. However, although the admin db can be listed, the MongoDB cluster is not yet ready for write workload, causing unavailability.