MongoDB operator crashes when pause is enabled and there are no pods created
General
Escalation
General
Escalation
Description
Describe the bug When the user provides some incorrect configuration that prevents statefulsets from creating pods and enables the pause, the operator keeps crashing.
To Reproduce Steps to reproduce the behavior:
Deploy a MongoDB cluster with pause enabled, 3 replicas and some configures that prevents sts from creating pods (e.g. non-existent runtimeClassName), for example:
Expected behavior The operator should run normally and be available.
Current behavior The operator keeps crashing and be in CrashLoopBackoff state.
Describe the bug
When the user provides some incorrect configuration that prevents statefulsets from creating pods and enables the pause, the operator keeps crashing.
To Reproduce
Steps to reproduce the behavior:
Deploy a MongoDB cluster with pause enabled, 3 replicas and some configures that prevents sts from creating pods (e.g. non-existent
runtimeClassName
), for example:Expected behavior
The operator should run normally and be available.
Current behavior
The operator keeps crashing and be in
CrashLoopBackoff
state.Root Cause
The root cause is that when pause is enabled, the operator tries to delete replset pods. Before it deletes pods, it checks whether the pod is primary at this line: https://github.com/percona/percona-server-mongodb-operator/blob/main/pkg/controller/perconaservermongodb/finalizers.go#L172. If the no pods are created, operator will access the index 0 of an empty PodList.
Desktop (please complete the following information):
OS: Ubuntu
Browser [e.g. chrome, safari]
Reproduced on latest release v0.8.3