Resync takes a lot of time (13 minutes) with a storage that only contains 20 backups:
Machine has 4 CPUs:
I don't know why resync takes this long.
Observations
As mentioned in the comments, the resync is slow because getting the physical restore metadata takes a long time. This is mainly due to how the metadata is stored — instead of being in one file, it's split into many small files in the storage (like S3 or GCS).
For each restore, we need to read and parse all these files to figure out the status of the cluster, each replicaset, and each node. This means many small download and decode operations, which adds up quickly.
Acceptance criteria:
make the sync/resync faster
please specify more criteria during the implementation
introduce a flag and config option to skip the .pbm.restore parsing. Process just the last one required for physical restore.
Hi we plan to change the default for pbm config --force-resync so it reads only the latest entry in .pbm.restore(skipping the rest) and fetches the full restore metadata only when --include-restores is additionally specified. Could you confirm this is okay for you? Thanks
ege.gunes
March 12, 2025 at 3:21 PM
Looks like the problem is resyncing physical restore metadata. I removed .pbm.restore dir from the storage and resync finished in 1m for 20 backups
radoslaw.szulgo
March 10, 2025 at 7:24 PM
Sure! Thanks for reporting. We’ll review that as soon as possible and most likely plan to fix within the next version.
Slava Sarzhan
March 10, 2025 at 4:57 PM
, could you please check this? It is critical for us. As you can see, even 15-20 backups with minimal data can take up to 20 minutes to resync.
Resync takes a lot of time (13 minutes) with a storage that only contains 20 backups:
Machine has 4 CPUs:
I don't know why resync takes this long.
Observations
As mentioned in the comments, the resync is slow because getting the physical restore metadata takes a long time. This is mainly due to how the metadata is stored — instead of being in one file, it's split into many small files in the storage (like S3 or GCS).
For each restore, we need to read and parse all these files to figure out the status of the cluster, each replicaset, and each node. This means many small download and decode operations, which adds up quickly.
Acceptance criteria:
make the sync/resync faster
please specify more criteria during the implementation
introduce a flag and config option to skip the
.pbm.restore
parsing. Process just the last one required for physical restore.