Issues
- Unable to delete failed backup jobsPBM-927Resolved issue: PBM-927
- Fix replaying oplog on system collections during the restorePBM-871Resolved issue: PBM-871andrew.pogrebnoi
- by stopping the pbm-agent with the unit systemd, the agent status remains okPBM-783Resolved issue: PBM-783Dmytro Zghoba
- Restore failures are not reportedPBM-745Resolved issue: PBM-745
- Review read/write concerns for pbm* collectionsPBM-741Resolved issue: PBM-741
- pbm status fails when pbm user's password uses special symbolsPBM-736Resolved issue: PBM-736andrew.pogrebnoi
- Add proper error message when PBM agents aren't availablePBM-731Jakub Vecera
- PBM backup erroring out with (CursorNotFound) cursor id not found / Mux ending but selectCases still openPBM-730Resolved issue: PBM-730
- make docker image universalPBM-726Resolved issue: PBM-726Serhii Stasiuk
- restore fails with applyOps: (Location10065) invalid parameter: expected an object ()PBM-725Resolved issue: PBM-725andrew.pogrebnoi
- Unique name for restore operationPBM-723Resolved issue: PBM-723andrew.pogrebnoi
- Fix pbm-agent crash during the delete-pitr request execution if there is nothing to deletePBM-722Resolved issue: PBM-722andrew.pogrebnoi
- PBM: retry upload if it fails in S3PBM-721Resolved issue: PBM-721Dmytro Zghoba
- "pbm delete-pitr" doesn't remove pitr slicesPBM-717Resolved issue: PBM-717Sandra Romanchenko
- Fix backup and PITR routines alignment algorithm to avoid backup failurePBM-714Resolved issue: PBM-714andrew.pogrebnoi
- Paging for CLI pbm logsPBM-713
- Avoid writing 'read/write on closed pipe' error in logs on expected connection closurePBM-705Resolved issue: PBM-705andrew.pogrebnoi
- PITR restore fails due to error "Failed to apply operation due to missing collection config.transactions"PBM-703Resolved issue: PBM-703andrew.pogrebnoi
- Prevent restore to time which isn't covered by PITR chunksPBM-701Resolved issue: PBM-701andrew.pogrebnoi
- Add support of MongoDB 5.0 TS collectionsPBM-697Resolved issue: PBM-697andrew.pogrebnoi
20 of 20
Unable to delete failed backup jobs
Incomplete
General
Escalation
General
Escalation
Description
Environment
None
Smart Checklist
Created August 19, 2022 at 11:06 AM
Updated December 10, 2023 at 8:36 AM
Resolved December 10, 2023 at 8:36 AM
Activity
Show:
Aaditya DubeyDecember 10, 2023 at 8:36 AM
Hi ,
Closing the report, no activity for a long!
Aaditya DubeyJanuary 27, 2023 at 2:19 PM
Hi ,
Thank you for the report.
Please let me know if issue is still persists.
This issue occurred in a production environment using 1.6.0. I have not managed to reproduce the initial race condition in 1.8.1, but I have confirmed the resulting inability to delete jobs in a bad state still applies which is what I consider to be the bug.
Steps to reproduce:
PBM is configured to take PTIR backups every 10 minutes
A full backup is triggered
The full backup fails due to starting ac the exact point a PITR backup was being taken. This caused the full backup to fail.
There is no way to delete the failed job record, except by manual deletion from Mongo or forcing a resync. There may be circumstances where forcing a resync doesn't work, as job results have been written to disk.
You will note from the timestamps this occurred a while ago. The impact of the failure has only recently come to light, as cleanup of old backups failed.
Logs from the initial backup failure:
The above failure created the following record in pbmBackups:
Which pbm status 1.6.0 reported as:
pbm status 1.8.1 reports the same status as
From my perspective, the bug isn't that the backup failed – failures can happen for lots of reasons. The bug is that PBM profiles no way to delete the failed job without manual intervention:
It should be possible to force PBM to cleanup any traces of an incomplete backup, possibly with an extra flag?