Backup jobs hang indefinitely
Description
Environment
Smart Checklist
Activity

Slava Sarzhan March 18, 2021 at 8:07 PM
I think we need to definitely improve the following case: if previous backup is not finished yet ( has Running state) we should not start a new one.

Slava Sarzhan March 18, 2021 at 8:03 PM
Yes, but it depends on the situation. If the user creates 'storageClassName' which was not created yet , they will have 10 backups (It will not break the cluster because we do not allow running the backups on all pods, at least one pod will handle the traffic and some of backups will just fail because other will be in progress and if backup node can't get donor the backups just fail) but if they misprinted in CR and then corrected it the user will have 9 backups in "Pending" state and one in 'Running'.

Vadim Tkachenko March 18, 2021 at 7:47 PM
If the issue is fixed then the ONE pending backup will be completed and then following jobs can be created (there will not be jobs in Pending state).
Imagine you have 10 Pending jobs and the issues is fixed - then we will have 10 backup jobs trying to do backup all together.

Slava Sarzhan March 18, 2021 at 7:19 PM
But what about if the issue was fixed e.g. user set wrong 'storageClassName' and then fixed it. And we have a lot of jobs which were not completed due to wrong configuration and the new one (with correct configuration) can not be started because previous had wrong configuration. I will discuss it with and we will try to find the way how to improve it.

Vadim Tkachenko March 18, 2021 at 6:46 PM
I understand why they are in "Pending" state, but I want us to be more advanced in handling this.
At very least, if there is a backup job in "Pending" state, another backup job should not be created
Details
Assignee
Slava SarzhanSlava SarzhanReporter
Vadim TkachenkoVadim TkachenkoLabels
Priority
Medium
Details
Details
Assignee

Reporter

Labels
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

I did not configure backup volume and I see backup cron jobs start and hang indefinitely
You can see some jobs are in "Pending" state for 5 days.
I think these jobs should fail with an error after some period of time.