Backup jobs hang indefinitely

Description

I did not configure backup volume and I see backup cron jobs start and hang indefinitely

You can see some jobs are in "Pending" state for 5 days.

I think these jobs should fail with an error after some period of time.

Environment

None

Smart Checklist

Activity

Slava Sarzhan March 18, 2021 at 8:07 PM

I think we need to definitely improve the following case: if previous backup is not finished yet ( has Running state)  we should not start a new one. 

Slava Sarzhan March 18, 2021 at 8:03 PM

Yes, but it depends on the situation. If the user creates  'storageClassName'  which was not created yet , they will have  10 backups  (It will not break the cluster because we do not allow running the backups on all pods, at least one pod will handle the traffic  and some of backups will just fail because other will be in progress and if backup node can't get donor the backups just fail) but if they misprinted in CR and then corrected it the user will have 9 backups in "Pending" state  and one in 'Running'.

Vadim Tkachenko March 18, 2021 at 7:47 PM

If the issue is fixed then the ONE pending backup will be completed and then following jobs can be created (there will not be jobs in Pending state).

Imagine you have 10 Pending jobs and the issues is fixed - then we will have 10 backup jobs trying to do backup all together.

Slava Sarzhan March 18, 2021 at 7:19 PM

But what about if the issue was fixed e.g. user set wrong 'storageClassName' and then fixed it. And we have a lot of jobs which were not completed due to wrong configuration and the new one (with correct configuration) can not be started because previous had wrong configuration.  I will discuss it with   and we will try to find the way how to improve it.

Vadim Tkachenko March 18, 2021 at 6:46 PM

I understand why they are in "Pending" state, but I want us to be more advanced in handling this.

At very least, if there is a backup job in "Pending" state, another backup job should not be created

Details

Assignee

Reporter

Priority

Smart Checklist

Created September 16, 2020 at 12:23 PM
Updated March 5, 2024 at 6:07 PM