Unable to delete failed backup jobs

Description

This issue occurred in a production environment using 1.6.0.  I have not managed to reproduce the initial race condition in 1.8.1, but I have confirmed the resulting inability to delete jobs in a bad state still applies which is what I consider to be the bug.

Steps to reproduce:

  • PBM is configured to take PTIR backups every 10 minutes

  • A full backup is triggered

  • The full backup fails due to starting ac the exact point a PITR backup was being taken.  This caused the full backup to fail.

  • There is no way to delete the failed job record, except by manual deletion from Mongo or forcing a resync.  There may be circumstances where forcing a resync doesn't work, as job results have been written to disk.

 

You will note from the timestamps this occurred a while ago.  The impact of the failure has only recently come to light, as cleanup of old backups failed.

 

Logs from the initial backup failure:

 

 

The above failure created the following record in pbmBackups:

 

 

Which pbm status 1.6.0 reported as:

 

 

pbm status 1.8.1 reports the same status as

From my perspective, the bug isn't that the backup failed – failures can happen for lots of reasons.  The bug is that PBM profiles no way to delete the failed job without manual intervention:

It should be possible to force PBM to cleanup any traces of an incomplete backup, possibly with an extra flag?

 

 

 

Environment

None

Smart Checklist

Activity

Aaditya Dubey December 10, 2023 at 8:36 AM

Hi ,

Closing the report, no activity for a long!

Aaditya Dubey January 27, 2023 at 2:19 PM

Hi ,

Thank you for the report.
Please let me know if issue is still persists.

Incomplete

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created August 19, 2022 at 11:06 AM
Updated December 10, 2023 at 8:36 AM
Resolved December 10, 2023 at 8:36 AM