Our demand-backup-sharding test is failing sporadically because backups end up in error status and what is more some of them even finish even though they have error status.
Looks like this:
backup1 finished, backup3 started and errored and then backup2 errored as well. backup3 in this case even finished with PBM, and backup2 was started when backup3 was running with PBM so PBM just ignored it and it was never finished.
List of backups from PBM:
As you can see two finished.
What happens is that we have "pbmStartingDeadline" set to 120 seconds (or so we though) and if the backup is in starting state but longer than 120 seconds we mark it as error. The problem is we never waited 120 seconds to mark it as error, they were marked almost instantly.
Our demand-backup-sharding test is failing sporadically because backups end up in error status and what is more some of them even finish even though they have error status.
Looks like this:
backup1 finished, backup3 started and errored and then backup2 errored as well.
backup3 in this case even finished with PBM, and backup2 was started when backup3 was running with PBM so PBM just ignored it and it was never finished.
List of backups from PBM:
As you can see two finished.
What happens is that we have "pbmStartingDeadline" set to 120 seconds (or so we though) and if the backup is in starting state but longer than 120 seconds we mark it as error.
The problem is we never waited 120 seconds to mark it as error, they were marked almost instantly.