Sharded cluster restoration failure in 2.8.0

Description

This ticket is a duplicate of in the PBM project as asked by for further investigation


Context:

PSMDB 1.19.1, pbm 2.8.0

Problem:

PITR restoration of a sharded cluster fails with the error (have never seen it before) :

Attaching psmdb operator and pbm logs, pbm status as well as psmdb, psmdb-restore, psmdb-backup CRs. The backup files can be found in percona-dev AWS, bucket name oksana-s3-2, the direct link https://us-east-2.console.aws.amazon.com/s3/buckets/oksana-s3-2?region=us-east-2&bucketType=general&prefix=mongodb-5ta/&showversions=false


STR

  1. Create sharded cluster: 3 nodes, 3 shards (rs), 3 configservers, pitr enabled

  2. Take a scheduled backup

  3. Add ~1.2 Gb of data (replace PASSWORD with databaseAdmin password, CLUSTER_NAME and NAMESPACE) :

  4. Take a scheduled backup

  5. Enable sharding for the collection:

    1. get password from clusterAdmin (replace NAMESPACE and CLUSTER_NAME) :

    2. run percona-client:

    3. connect to mongo using clusterAdmin:

    4. Shard the collection:

    5. Wait while the data is sharded (run the last command several times so it shows 3 shards with some stable percentage of distribution between the runs)

  6. Take a scheduled backup

  7. Wait ~30 min
    (UPD: in my case I was monitoring the pitr chunks uploading intervals in the pbm status. The interval updates each 10 min by default. First the interval was ending with the time when the latest backup completed, then in 10 min it was updated (update1, +10 min), then in 10 min it was updated again (update2, +10 min).

  8. Restore using latest backup + pitr to a time after the latest backup (UPD: time between update1 and update2)

  9. Restore fails with:

Environment

None

Attachments

7

Activity

Details

Assignee

Reporter

Labels

Needs QA

Components

Priority

Created February 13, 2025 at 12:01 PM
Updated April 15, 2025 at 11:20 AM