[BUG] Arbiter statefulset gets mistakenly deleted when reading stale `replset.arbiter.enabled`

General

Escalation

General

Escalation

Description

Describe the bug

Similar to https://jira.percona.com/browse/K8SPSMDB-433, we find that arbiter statefulset can also be mistakenly deleted when reading stale information, concretely `replset.arbiter.enabled`. If the current `replset.arbiter.enabled` is true, but the controller reads a stale `replset.arbiter.enabled` which is false, it will delete the arbiter statefulset immediately as below:

if replset.Arbiter.Enabled {
  ...
} else {
  err := r.client.Delete(context.TODO(), psmdb.NewStatefulSet(...))
  ...
}

To Reproduce

Steps to reproduce the behavior:

Create a PerconaServerMongoDB with `arbiter.enabled` set to false. The controller talks to apiserver1.
Change `spec.sharding.enabled` to true to enable arbiter. The arbiter statefulset will be created to support. Meanwhile, apiserver2 gets stuck and holds the view that `arbiter.enabled` is still false.
The controller restarts after a node failure and talks to the stale apiserver2. The stale `arbiter.enabled` value from the apiserver2 makes the controller delete the arbiter statefulset used for sharding.

Fix

We are willing to issue a patch to help fix this issue.

Similar to https://jira.percona.com/browse/K8SPSMDB-433, we can label the current resource version of PerconaServerMongoDB when creating the arbiter statefulset. And we can always compare the resource version R1 of the PerconaServerMongoDB and the one R2 labeled to the statefulset. If R1 < R2, we know it is a stale PerconaServerMongoDB and we shall not delete arbiter statefulset even when its `arbiter.enabled` is false.

Environment

None

Linked issues

relates to

K8SPSMDB-433

[BUG] Config statefulset gets mistakenly deleted when reading stale `spec.sharding.enabled`

Smart Checklist

Activity

Show:

Lalit Choudhary May 6, 2021 at 10:18 AM

sorry, i see you created PR already: https://github.com/percona/percona-server-mongodb-operator/pull/639

Thanks.

Lalit Choudhary May 6, 2021 at 10:16 AM

Hi @sieveteam

Thank you for the patch and report.

I see the patch provided on https://perconadev.atlassian.net/browse/K8SPSMDB-433#icft=K8SPSMDB-433 already merged if you want you can create a separate patch for this issue fix.

Once again thank you for your contribution.

sieveteam April 15, 2021 at 5:39 PM

Since this issue is very similar to https://jira.percona.com/browse/K8SPSMDB-433, we will issue one PR to help fix both, likely this weekend.

Details
Assignee
Unassigned
Reporter
sieveteam
Affects versions
1.7.0
1.8.0
1.9.0
Priority
High

Smart Checklist

Created April 15, 2021 at 5:37 PM

Updated March 5, 2024 at 4:55 PM

Configure

[BUG] Arbiter statefulset gets mistakenly deleted when reading stale `replset.arbiter.enabled`

Description

Environment

Linked issues

relates to

Smart Checklist

Activity

Lalit Choudhary May 6, 2021 at 10:18 AM

Lalit Choudhary May 6, 2021 at 10:16 AM

sieveteam April 15, 2021 at 5:39 PM

Details
Assignee
Unassigned
Reporter
sieveteam
Affects versions
1.7.0
1.8.0
1.9.0
Priority
High

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

[BUG] Arbiter statefulset gets mistakenly deleted when reading stale `replset.arbiter.enabled`

Description

Environment

Linked issues

relates to

Smart Checklist

Activity

Lalit Choudhary May 6, 2021 at 10:18 AM

Lalit Choudhary May 6, 2021 at 10:16 AM

sieveteam April 15, 2021 at 5:39 PM

DetailsAssigneeUnassignedUnassignedReportersieveteamsieveteamAffects versions1.7.01.8.01.9.0PriorityHigh

Details

Assignee

Reporter

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

Details
Assignee
Unassigned
Reporter
sieveteam
Affects versions
1.7.0
1.8.0
1.9.0
Priority
High

Smart Checklist