CronJob backup retry is prevented by is prevented by Retention policy

Description

Please update the documentation with scheduled backups retry logic and mention the importance of "delete" permissions on backup files to get a successful backup after retry.

Google Cloud Storage's Retention Periods is the obvious reason for S3 objects deletion errors.

If a backup not fully uploaded during first attempt, the subsequent backup attempts will be failed due to retention policy

There are backup pod logs for 5.7 cluster. 10 "Error" pods are created for the backup. It's an earliest pod (and it still starts from delete error).

"...-2021-04-27-00:00:02-full.sst_info already exists" error exists in every pod created by the problematic job. There is no "first

Backup to s3://...-2021-04-27-00:00:02-full started
+ mc -C /tmp/mc config host add dest https://storage.googleapis.com ...
Added `dest` successfully.
+ xbcloud delete --storage=s3 --s3-bucket=...-2021-04-27-00:00:02-full.sst_info
210427 17:54:37 xbcloud: Failed to delete object ...-2021-04-27-00:00:02-full.sst_info/sst_info.00000000000000000000. Error message: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>xxx does not have storage.objects.delete access to the Google Cloud Storage object.</Details></Error>
....
210427 17:54:41 xbcloud: error: backup named ...-2021-04-27-00:00:02-full.sst_info already exists!
...
+ xbcloud put --storage=s3 --parallel=10 --md5 --s3-bucket=...-2021-04-27-00:00:02-full
...
210427 17:54:42 xbcloud: error: backup named ...-2021-04-27-00:00:02-full already exists!
mc: <ERROR> Unable to stat `...-2021-04-27-00:00:02-full.md5`. Object does not exist.

Environment

None

AFFECTED CS IDs

CS0017713

Smart Checklist

Activity

Jira Bot April 28, 2021 at 3:56 PM

To:
CC:

Hi, I'm jira-bot, Percona's Jira automation tool. I've detected that someone from
Percona has made an edit to the Summary field of an issue that you reported.

I'm not sentient (yet) so I'm not sure whether the person fixed a typo, changed
a few words, or completely rewrote the text. In any case, it is Percona Engineering's
intention to make the Summary and Description of an issue as accurate as possible
so that we're fixing the actual problem you're encountering, and to avoid
misunderstandings about symptoms and causes.

If the current Summary does not accurately reflect the problem you are reporting,
or if you feel the change was otherwise inappropriate in some way, please add a
new comment explaining things and we'll address it as soon as we can.

This message will be added only once per issue, regardless of how many times
the Summary is edited.

message-code:summary-edited

Done

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Smart Checklist

Created April 28, 2021 at 2:53 PM
Updated March 5, 2024 at 5:52 PM
Resolved October 14, 2021 at 10:29 AM

Flag notifications