Duplicates in backup made by PBM agent

Description

When restoring data from a backup, errors of this kind appeared:
E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "67d9a7fc-2676-4dbf-9f0a-10219521bt53", categoryId: 13 } }

I looked inside the bson file and found that the file contains duplicates of some documents, but they differ only in the value in the expiringAt field, because date:

{"_id":{"visitorId":"5f87f8c7-78f2-5de2-b4a9-c10022157941","categoryId":{"$numberLong":"11271"}},"expiringAt":{"$date":{" $numberLong":"1702793217011"}}}
{"_id":{"visitorId":"5f87f8c7-78f2-5de2-b4a9-c10022157941","categoryId":{"$numberLong":"11271"}},"expiringAt":{"$date":{" $numberLong":"1700092058180"}}}

Having converted the date into a human-readable one, I discovered that one entry is identical to the restored one, the other entry is a duplicate.

There is a hypothesis that during a backup the document is updated and both options end up in the backup file. I decided to test this hypothesis and, by running a collection mongodump, I update the document using the expiringAt field. As a result, I did not find any duplicates in the completed mongodam, because... Couldn't reproduce the problem.

I assumed that the balancing of chunks affects the backup and because of this, duplicates may end up in the backup. I stopped balancing one day before and performed a backup. As a result, during recovery, duplicates appeared again.

2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "40b78f7b-67e0-4d5c-a0e3-ea8ca9a30805", categoryId: 2777838 } }
2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "40b78f7b-67e0-4d5c-a0e3-ea8ca9a30805", categoryId: 1452892 } }
2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "67d9a7fc-2676-4dbf-9f0a-10219521bf53", categoryId: 13 } }
2023-11-03T10:32:47.783+0000 ########################### sponsore.interested 30.4GB/30.4GB (100.0%)
2023-11-03T10:32:47.783+0000 finished restoring sponsore.interested (304553135 documents, 298 failures)
2023-11-03T10:32:47.783+0000 304553135 document(s) restored successfully. 298 document(s) failed to restore.

The backup will be performed logical.
Version pbm: v2.3.0
Version mongodb slightly different on shards: v5.0.15, v5.0.6, v5.0.18

collection volume 200GB

Environment

None

Activity

Alexander Girke August 16, 2024 at 8:18 AM

could you please tell me, why this has been closed? The ticket you linked was also closed as a duplicate on & didn’t resolve the issue as far as I can see. Please re-open, because to my knowledge the bug still exists.

Jan Mynar August 15, 2024 at 9:04 AM

closing this issue because of duplication of https://perconadev.atlassian.net/browse/PBM-1197

radoslaw.szulgo July 30, 2024 at 9:02 AM

any plans for this bug?

Duplicate

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created November 3, 2023 at 11:11 AM
Updated August 16, 2024 at 8:22 AM
Resolved August 15, 2024 at 9:04 AM