We had a case where restore status was showing true without any restore running.

Description

We saw a case where restore: true appeared in postgrescluster without any restore running.

Due to this, restore was failing with messages

time="2024-02-02T13:26:06Z" level=info msg="Waiting for another restore to finish" PerconaPGRestore=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 controller=perconapgrestore controllerGroup=pgv2.percona.com controllerKind=PerconaPGRestore name=restore-2024-01-29-21-03 namespace=dbaas-postgresql-oneitsm-prod reconcileID=b0d66b3e-0a37-4fe9-a91a-fa825ca67fa3 request=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 version= time="2024-02-02T13:26:11Z" level=info msg="Waiting for another restore to finish" PerconaPGRestore=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 controller=perconapgrestore controllerGroup=pgv2.percona.com controllerKind=PerconaPGRestore name=restore-2024-01-29-21-03 namespace=dbaas-postgresql-oneitsm-prod reconcileID=2112dbc4-ffe7-4858-aa8c-56f76e239f48 request=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 version= time="2024-02-02T13:26:16Z" level=info msg="Waiting for another restore to finish" PerconaPGRestore=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 controller=perconapgrestore controllerGroup=pgv2.percona.com controllerKind=PerconaPGRestore name=restore-2024-01-29-21-03 namespace=dbaas-postgresql-oneitsm-prod reconcileID=897443fc-74fc-4768-bcfd-24125678462f request=dbaas-postgresql-oneitsm-prod/restore-2024-01-29-21-03 version=

We had to manually edit the CR so that the restore can be performed.

Environment

None

AFFECTED CS IDs

CS0046315 - Backup status out of sync

Activity

Jobin Augustine May 23, 2024 at 7:36 AM

The status synchronization between pgBackRest and Operator is again getting out of sync in rare conditions.
Recently, we had a customer case where annotations indicated that the backup was “running” without any backups actually running. This prevented further backups. To fix the problem, we manually removed the annotations.

kubectl annotate pg/custername pgv2.percona.com/backup-in-progress- kubectl annotate pg/custername postgres-operator.crunchydata.com/pgbackrest-backup-

So, we need a mechanism in the operator to periodically check/poll the status and maintain the annotations/status at the operator level.

Jobin Augustine May 6, 2024 at 12:28 PM

, This happened in a customer environment. We don’t know how it happened and the steps to reproduce.

Does the operator have logic to ensure the status is in perfect sync with the restore running?

natalia.marukovich May 6, 2024 at 11:05 AM

hi! Could you please provide steps to reproduce?

Done

Details

Assignee

Reporter

Needs QA

Yes

Fix versions

Affects versions

Priority

Smart Checklist

Created February 2, 2024 at 3:57 PM
Updated July 1, 2024 at 1:36 PM
Resolved June 19, 2024 at 8:16 AM

Flag notifications