Issues

Select view

Select search mode

 
20 of 20

Unable to delete failed backup jobs

Incomplete

Description

This issue occurred in a production environment using 1.6.0.  I have not managed to reproduce the initial race condition in 1.8.1, but I have confirmed the resulting inability to delete jobs in a bad state still applies which is what I consider to be the bug.

Steps to reproduce:

  • PBM is configured to take PTIR backups every 10 minutes

  • A full backup is triggered

  • The full backup fails due to starting ac the exact point a PITR backup was being taken.  This caused the full backup to fail.

  • There is no way to delete the failed job record, except by manual deletion from Mongo or forcing a resync.  There may be circumstances where forcing a resync doesn't work, as job results have been written to disk.

 

You will note from the timestamps this occurred a while ago.  The impact of the failure has only recently come to light, as cleanup of old backups failed.

 

Logs from the initial backup failure:

 

 

The above failure created the following record in pbmBackups:

 

 

Which pbm status 1.6.0 reported as:

 

 

pbm status 1.8.1 reports the same status as

From my perspective, the bug isn't that the backup failed – failures can happen for lots of reasons.  The bug is that PBM profiles no way to delete the failed job without manual intervention:

It should be possible to force PBM to cleanup any traces of an incomplete backup, possibly with an extra flag?

 

 

 

Environment

None

Smart Checklist

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created August 19, 2022 at 11:06 AM
Updated December 10, 2023 at 8:36 AM
Resolved December 10, 2023 at 8:36 AM

Activity

Show:

Aaditya DubeyDecember 10, 2023 at 8:36 AM

Hi ,

Closing the report, no activity for a long!

Aaditya DubeyJanuary 27, 2023 at 2:19 PM

Hi ,

Thank you for the report.
Please let me know if issue is still persists.