restore fails with applyOps: (Location10065) invalid parameter: expected an object ()
Description
Environment
Attachments
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 01:44 PM
- 21 Sep 2021, 09:01 AM
- 21 Sep 2021, 09:01 AM
Smart Checklist
Activity
Tomislav Plavcic October 4, 2021 at 9:50 AMEdited
@andrew.pogrebnoi
Here are the steps to reproduce:
Use the script from here: https://github.com/tplavcic/cloud-qa/blob/main/mongo_startup.sh
(it's the same old script, but changed to run 3 shards instead of 2 - it seems that it's easier to reproduce with 3 shards)
Use PSMDB binary tarball and PBM binary tarball for this script.
Steps:
$ ./mongo_startup.sh -s --pbmDir=/home/plavi/lab/pbm/bin/percona-backup-mongodb-1.6.0 --pbmStorage=fs
$ export PBM_MONGODB_URI=mongodb://localhost:27017/
$ ./pbm_store_set.sh
$ ./pbm backup # first backup
wait until "config.system.sessions" starts to get sharded and chunks migrated, best to have chunks in all 3 shards like below (and it might take few minutes for this to start, I'm not sure what really kicks it off in mongo):
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
rs1 1020
rs2 2
rs3 2
$ ./pbm backup # run second backup while this migration in progress
mongos> sh.stopBalancer()
$ ./pbm restore <ID of second (or last) backup>
notice this in pbm logs:
$ ./pbm logs
2021-10-04T09:27:58Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] restore started
2021-10-04T09:27:58Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] restore started
2021-10-04T09:27:58Z I [rs1/localhost:27019] [restore/2021-10-04T09:26:38Z] restore started
2021-10-04T09:27:58Z I [config/localhost:27028] [restore/2021-10-04T09:26:38Z] restore started
2021-10-04T09:28:02Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] mongorestore finished
2021-10-04T09:28:02Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] mongorestore finished
2021-10-04T09:28:02Z I [rs1/localhost:27019] [restore/2021-10-04T09:26:38Z] mongorestore finished
2021-10-04T09:28:02Z I [config/localhost:27028] [restore/2021-10-04T09:26:38Z] mongorestore finished
2021-10-04T09:28:04Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] restoring users and roles
2021-10-04T09:28:04Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] restoring users and roles
2021-10-04T09:28:04Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] starting oplog replay
2021-10-04T09:28:04Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] starting oplog replay
2021-10-04T09:28:04Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] oplog replay finished on {1633339611 20}
2021-10-04T09:28:04Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] oplog replay finished on {1633339614 21}
2021-10-04T09:28:04Z I [rs2/localhost:28019] [restore/2021-10-04T09:26:38Z] restore finished successfully
2021-10-04T09:28:04Z I [rs3/localhost:29019] [restore/2021-10-04T09:26:38Z] restore finished successfully
2021-10-04T09:28:04Z I [rs1/localhost:27019] [restore/2021-10-04T09:26:38Z] restoring users and roles
2021-10-04T09:28:04Z I [rs1/localhost:27019] [restore/2021-10-04T09:26:38Z] starting oplog replay
2021-10-04T09:28:04Z E [rs1/localhost:27019] [restore/2021-10-04T09:26:38Z] restore: oplog apply: applying an entry: applyOps: (Location10065) invalid parameter: expected an object ()
2021-10-04T09:28:04Z E [config/localhost:27028] [restore/2021-10-04T09:26:38Z] restore: waiting for start: cluster failed: oplog apply: applying an entry: applyOps: (Location10065) invalid parameter: expected an object ()
Kimberly Wilkins September 30, 2021 at 7:10 AM
Does this occur all the time? Or only when using Kubernetes envs?
Tomislav Plavcic September 29, 2021 at 8:13 AM
1. I have found today that 4.0 is not affected by this. So 4.0 and 5.0 seems unaffected and 4.2/4.4 seem affected.
2. Yes, this is just a regular backup/restore without PITR.
andrew.pogrebnoi September 29, 2021 at 8:05 AM
Thanks, Tomislav!
Did you try v4.0?
It's just a regular backup/restore - without any PITR. Right?
Tomislav Plavcic September 24, 2021 at 3:14 PMEdited
So I have played a bit with this today.
Good news: it doesn't affect PSMDB 5.0.2, only 4.4/4.2.
Bad news: Sleep before backup and before restore doesn't help. Also I tried to enter much more data during the sleep before backup and it didn't help.
I have managed to reproduce it manually at least once without much data with steps like:
1. create a new user: "db.createUser({user:"myApp",pwd:"myPass",roles:[{db:"myApp",role:"readWrite"}]})"
2. add 4 documents into one test collection
3. create backup
4. drop test collection
5. do restore
Not sure if any of this helps, but these are my findings so far.
The issue seems to happen on restore and looks like:
... 2021-09-19T20:05:28.150+0000 no indexes to restore for collection config.migrationCoordinators 2021-09-19T20:05:28.150+0000 no indexes to restore for collection config.rangeDeletions 2021-09-19T20:05:28.150+0000 no indexes to restore for collection admin.pbmRUsers 2021-09-19T20:05:28.000+0000 I [restore/2021-09-19T20:01:28Z] mongorestore finished 2021-09-19T20:05:31.000+0000 I [restore/2021-09-19T20:01:28Z] restoring users and roles 2021-09-19T20:05:31.000+0000 I [restore/2021-09-19T20:01:28Z] starting oplog replay 2021-09-19T20:05:31.000+0000 E [restore/2021-09-19T20:01:28Z] restore: oplog apply: applying an entry: applyOps: (Location10065) invalid parameter: expected an object () 2021-09-19T20:05:31.000+0000 D [restore/2021-09-19T20:01:28Z] releasing lock
Not sure currently is it because of some parameter that we pass or genuine PBM issue.
More logs:
rs0 pbm-agent logs:
cfg0 pbm-agent logs:
operator log:
Version info:
2021-09-20T08:35:28.000+0000 I pbm-agent:
Version: 1.6.0
Platform: linux/amd64
GitCommit: 5ccc7f0911ddc73c6a33bb36c85e2a8183f17bea
GitBranch: release-1.6.0-fix-k9sdocker
BuildTime: 2021-09-19_16:53_UTC
GoVersion: go1.15.15
MongoDB version: 4.4.8-9
PBM status:
Steps to reproduce:
1. set env variables
export IMAGE=perconalab/percona-server-mongodb-operator:k8spsmdb-566 export IMAGE_PMM=percona/pmm-client:2.21.0 export IMAGE_BACKUP=perconalab/percona-server-mongodb-operator:1.10.0-backup export IMAGE_MONGOD=percona/percona-server-mongodb:4.4.8-9
2. git clone https://github.com/percona/percona-server-mongodb-operator.git --branch release-1.10.0
3. add cloud secret into e2e-tests/conf/cloud-secret.yml (if you don't have it please ping me)
4. execute e2e-tests/demand-backup-sharded/run and wait until it reaches restore from aws backup (~20 min)