Details
Assignee
Andrii DemaAndrii DemaReporter
yoann.lacancellerayoann.lacancelleraNeeds QA
YesStory Points
5Sprint
Fix versions
Affects versions
Priority
Medium
Details
Details
Assignee
Andrii Dema
Andrii DemaReporter
yoann.lacancellera
yoann.lacancelleraNeeds QA
Yes
Story Points
5
Sprint
Fix versions
Affects versions
Priority
Smart Checklist
Smart Checklist
Smart Checklist
Created January 16, 2025 at 3:37 PM
Updated March 19, 2025 at 12:54 PM
When using ttlSecondsAfterFinished, it seems there is a chance of a race condition where the jobs is deleted while the operator did not have enough time to reconcile the perconapgbackups object.
It does not only happen with unreasonably short ttlSecondsAfterFinished, but even with 1m, 5m or even 30m timeouts.
The perconapgbackups stays Running forever, blocking the subsequent backups
(not sure how 3 of them ended up Running, some logs have rotated unforutnatley)
When getting
it ends stuck right at the beginning of the loop
I would argue there should be a mechanism to drop automatically this kind of stale perconapgbackups.
I would assume it to be failed and retry the backup (or let the next one run, if there is already), but here this prevent backups from running.
It is also unclear to me why it can miss some jobs even with 30m of ttl
How to reproduce
After some time, I end up reproducing this issue.