[BUG] Arbiter statefulset gets mistakenly deleted when reading stale `replset.arbiter.enabled`

Description

Describe the bug

Similar to https://jira.percona.com/browse/K8SPSMDB-433, we find that arbiter statefulset can also be mistakenly deleted when reading stale information, concretely `replset.arbiter.enabled`. If the current `replset.arbiter.enabled` is true, but the controller reads a stale `replset.arbiter.enabled` which is false, it will delete the arbiter statefulset immediately as below:

 

if replset.Arbiter.Enabled { ... } else { err := r.client.Delete(context.TODO(), psmdb.NewStatefulSet(...)) ... }

 

To Reproduce

Steps to reproduce the behavior:

  1. Create a PerconaServerMongoDB with `arbiter.enabled` set to false. The controller talks to apiserver1.

  2. Change `spec.sharding.enabled` to true to enable arbiter. The arbiter statefulset will be created to support. Meanwhile, apiserver2 gets stuck and holds the view that `arbiter.enabled` is still false.

  3. The controller restarts after a node failure and talks to the stale apiserver2. The stale `arbiter.enabled` value from the apiserver2 makes the controller delete the arbiter statefulset used for sharding.

 

Fix

We are willing to issue a patch to help fix this issue.

Similar to https://jira.percona.com/browse/K8SPSMDB-433,  we can label the current resource version of PerconaServerMongoDB when creating the arbiter statefulset. And we can always compare the resource version R1 of the PerconaServerMongoDB and the one R2 labeled to the statefulset. If R1 < R2, we know it is a stale PerconaServerMongoDB and we shall not delete arbiter statefulset even when its `arbiter.enabled` is false.

Environment

None

Smart Checklist

Activity

Show:

Lalit Choudhary May 6, 2021 at 10:18 AM

Lalit Choudhary May 6, 2021 at 10:16 AM

Hi

Thank you for the patch and report.

I see the patch provided on https://perconadev.atlassian.net/browse/K8SPSMDB-433#icft=K8SPSMDB-433 already merged if you want you can create a separate patch for this issue fix.

Once again thank you for your contribution.

 

sieveteam April 15, 2021 at 5:39 PM

Since this issue is very similar to https://jira.percona.com/browse/K8SPSMDB-433, we will issue one PR to help fix both, likely this weekend.

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created April 15, 2021 at 5:37 PM
Updated March 5, 2024 at 4:55 PM

Flag notifications