pbm delete-backup does not wait for backup to delete, does not queue up additional requests

Description

Hi,

I’m trying to clean up old backups on a system that was supposed to be pruned via a script that wasn’t working. This bug is making it really tedious to do so. Running pbm 2.3.1 (started on 2.2.1 but upgraded to see if it fixed it before filing the bug)

When I run e.g. pbm delete-backup -y 2023-08-20T08:44:13Z I see this:

However, though it says “waiting for delete to be done… [done]” it is not actually done, rather it’s pending in the background. Logs show e.g.:

Note that it takes 5-8 minutes to delete each (~100-300GB) backup, most of which are incremental (not base) that I'm deleting.

What makes this really frustrating is that if I queue up multiple at the same time they will all report that they are working (and done), e.g.

However, if it wasn’t running anything when I ran those 4 commands then it would delete the first, then when finished it would delete the second, and any additional pending commands would just be ignored, even though they look like they worked.

In fact, unless I’m tailing the logs I can’t even tell when the backup finishes or that it’s doing anything.

Since I have some 50 backups to delete I’ve been spending days running two commands at a time, then waiting 10-15 minutes and doing it again. There is no way to detect when it finishes, so I just have to wait for the logs to show that I can send the next command(s).

Environment

pbm 2.3.1, percona mongodb 4.4.23-22 is the database being backed up.

agents are running in either docker containers or else in kubernetes pods

Activity

Richard Bateman June 25, 2024 at 2:37 AM

Maybe it’s just me, but when a command says it worked but silently does nothing that seems like a bug

Aaditya Dubey June 18, 2024 at 8:33 AM

Hi

Thank you for the explanations and use case; it is more like a feature request than a bug. Sending the concern to engineering for further review and updates.

Richard Bateman June 17, 2024 at 5:35 PM

The primary use case where one might want to do this (for me at least) is when you want to prune old backups but still want to keep historical snapshots just in case. This is a very common backup strategy, and one that I wish PBM had support for natively.

For example our current policy is to keep daily snapshots for the last 14 days, weekly for the last 6 weeks, monthly for the last 6 months, and yearly for the last 2 years. Ideally you wouldn’t need to keep extras when they overlap.

If you stay on top of that it’s fine, but if something is wrong or if you ever change your strategy you may end up with a lot of older backups that you want to delete but still want to keep one for each week, or each month, etc – which deleting all older than X wouldn’t allow you to do.

Ideally the “wait” flags would actually do what they claim and/or delete could queue things up correctly – or at least it would throw an error instead of accepting and ignoring the command. Alternatives I could think of if that’s for some reason difficult, though:

  • You could just implement a pruning strategy like I mentioned, with a “dry-run” option so someone can see what will be deleted before running it. That would be fantastic and save me a lot of work, I ended up writing a node script to do it for me.

  • You could make the delete command capable of accepting multiple arguments, each a timestamp, and have it delete all of those

  • You could make it detect that the delete command is still running and can’t accept additional deletes and then either a) wait until it finishes and issue the command then or b) return a non-zero code so that a script could do that.

The issue itself is a bit annoying, but what makes it completely broken is that it essentially claims to work – but doesn’t.

Sandra Romanchenko June 13, 2024 at 12:51 PM

Hi

Is it possible to use delete-backup command with older-than option instead of deleting backups one by one? If not, can you please provide the reasoning behind this so we can better understand and possibly address your use-case in future?

As an alternative, you could also use new cleanup command, which has the same older-than option and will be more suitable if you have PITR chunks

Please note that both commands have dry-run option, so you can check what backups will be deleted before actually executing the command.

For cleanup you can also use wait flag, so command won’t be finished until deletion is actually complete.

Aaditya Dubey February 7, 2024 at 2:47 PM

Hi

Thank you for the report.
We are checking on this, will keep you posted.
Sorry for delay in response!

Details

Assignee

Reporter

Labels

Needs QA

Yes

Needs Doc

Yes

Affects versions

Priority

Smart Checklist

Created January 9, 2024 at 9:40 PM
Updated December 10, 2024 at 12:50 PM