[BUG] Arbiter statefulset gets mistakenly deleted when reading stale `replset.arbiter.enabled`
General
Escalation
General
Escalation
Description
Describe the bug
Similar to https://jira.percona.com/browse/K8SPSMDB-433, we find that arbiter statefulset can also be mistakenly deleted when reading stale information, concretely `replset.arbiter.enabled`. If the current `replset.arbiter.enabled` is true, but the controller reads a stale `replset.arbiter.enabled` which is false, it will delete the arbiter statefulset immediately as below:
Create a PerconaServerMongoDB with `arbiter.enabled` set to false. The controller talks to apiserver1.
Change `spec.sharding.enabled` to true to enable arbiter. The arbiter statefulset will be created to support. Meanwhile, apiserver2 gets stuck and holds the view that `arbiter.enabled` is still false.
The controller restarts after a node failure and talks to the stale apiserver2. The stale `arbiter.enabled` value from the apiserver2 makes the controller delete the arbiter statefulset used for sharding.
Fix
We are willing to issue a patch to help fix this issue.
Similar to https://jira.percona.com/browse/K8SPSMDB-433, we can label the current resource version of PerconaServerMongoDB when creating the arbiter statefulset. And we can always compare the resource version R1 of the PerconaServerMongoDB and the one R2 labeled to the statefulset. If R1 < R2, we know it is a stale PerconaServerMongoDB and we shall not delete arbiter statefulset even when its `arbiter.enabled` is false.
Describe the bug
Similar to https://jira.percona.com/browse/K8SPSMDB-433, we find that arbiter statefulset can also be mistakenly deleted when reading stale information, concretely `replset.arbiter.enabled`. If the current `replset.arbiter.enabled` is true, but the controller reads a stale `replset.arbiter.enabled` which is false, it will delete the arbiter statefulset immediately as below:
if replset.Arbiter.Enabled { ... } else { err := r.client.Delete(context.TODO(), psmdb.NewStatefulSet(...)) ... }
To Reproduce
Steps to reproduce the behavior:
Create a PerconaServerMongoDB with `arbiter.enabled` set to false. The controller talks to apiserver1.
Change `spec.sharding.enabled` to true to enable arbiter. The arbiter statefulset will be created to support. Meanwhile, apiserver2 gets stuck and holds the view that `arbiter.enabled` is still false.
The controller restarts after a node failure and talks to the stale apiserver2. The stale `arbiter.enabled` value from the apiserver2 makes the controller delete the arbiter statefulset used for sharding.
Fix
We are willing to issue a patch to help fix this issue.
Similar to https://jira.percona.com/browse/K8SPSMDB-433, we can label the current resource version of PerconaServerMongoDB when creating the arbiter statefulset. And we can always compare the resource version R1 of the PerconaServerMongoDB and the one R2 labeled to the statefulset. If R1 < R2, we know it is a stale PerconaServerMongoDB and we shall not delete arbiter statefulset even when its `arbiter.enabled` is false.