backup could change cluster status to error

Description

The problem could be reproduced on openshift and 4-node k3s cluster.

Steps to reproduce:
setup a fresh cluster and install operator:

Setup minio with helm, merge backup configuration to cr.yaml and apply cr.yaml

Apply backups one by one (change backup name) until psmdb status became error:

If a cluster in the error state backup is failed.

operator produces periodic messages with new error:

During error state mongo clients connecting slowly (ssl enabled):

Number of connections slowly grows:

the mongo server is reachable from operator host:
curl -k https://my-cluster-name-rs0.default.svc.cluster.local:27017/
curl: (52) NSS: client certificate not found (nickname not specified)

operator has many tcp connections opened:

after kubectl delete pod percona-server-mongodb-operator-588db759d-fcgww
psmdb status returns to normal and backup are possible again.

The stale could be related to similar issue at:
https://jira.percona.com/browse/K8SPSMDB-271

The error is consistently could be reproduced on my host (if it's idle, if CPU is used by other tasks and kubernetes is slow backup is not causing psmdb error and further backup errors)

Environment

None

AFFECTED CS IDs

CS0012909

Attachments

1

Smart Checklist

Activity

Sergey Pronin October 19, 2020 at 9:56 AM

Nickolay Ihalainen October 14, 2020 at 1:49 PM

The problem is not happening with "allowUnsafeConfigurations: true" (no SSL) and the number of connections is stable.

Nickolay Ihalainen October 14, 2020 at 1:25 PM

Describe and logs output

Duplicate

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created October 14, 2020 at 1:22 PM
Updated March 5, 2024 at 5:04 PM
Resolved October 19, 2020 at 9:56 AM