Duplicate
Details
Assignee
UnassignedUnassignedReporter
Anton KireevAnton KireevNeeds QA
YesAffects versions
Priority
Medium
Details
Details
Assignee
Unassigned
UnassignedReporter
Anton Kireev
Anton KireevNeeds QA
Yes
Affects versions
Priority
Smart Checklist
Smart Checklist
Smart Checklist
Created November 3, 2023 at 11:11 AM
Updated August 16, 2024 at 8:22 AM
Resolved August 15, 2024 at 9:04 AM
When restoring data from a backup, errors of this kind appeared:
E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "67d9a7fc-2676-4dbf-9f0a-10219521bt53", categoryId: 13 } }
I looked inside the bson file and found that the file contains duplicates of some documents, but they differ only in the value in the expiringAt field, because date:
{"_id":{"visitorId":"5f87f8c7-78f2-5de2-b4a9-c10022157941","categoryId":{"$numberLong":"11271"}},"expiringAt":{"$date":{" $numberLong":"1702793217011"}}}
{"_id":{"visitorId":"5f87f8c7-78f2-5de2-b4a9-c10022157941","categoryId":{"$numberLong":"11271"}},"expiringAt":{"$date":{" $numberLong":"1700092058180"}}}
Having converted the date into a human-readable one, I discovered that one entry is identical to the restored one, the other entry is a duplicate.
There is a hypothesis that during a backup the document is updated and both options end up in the backup file. I decided to test this hypothesis and, by running a collection mongodump, I update the document using the expiringAt field. As a result, I did not find any duplicates in the completed mongodam, because... Couldn't reproduce the problem.
I assumed that the balancing of chunks affects the backup and because of this, duplicates may end up in the backup. I stopped balancing one day before and performed a backup. As a result, during recovery, duplicates appeared again.
2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "40b78f7b-67e0-4d5c-a0e3-ea8ca9a30805", categoryId: 2777838 } }
2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "40b78f7b-67e0-4d5c-a0e3-ea8ca9a30805", categoryId: 1452892 } }
2023-11-03T10:32:47.783+0000 continuing through error: E11000 duplicate key error collection: sponsore.interested index: id dup key: { _id: { visitorId: "67d9a7fc-2676-4dbf-9f0a-10219521bf53", categoryId: 13 } }
2023-11-03T10:32:47.783+0000 ########################### sponsore.interested 30.4GB/30.4GB (100.0%)
2023-11-03T10:32:47.783+0000 finished restoring sponsore.interested (304553135 documents, 298 failures)
2023-11-03T10:32:47.783+0000 304553135 document(s) restored successfully. 298 document(s) failed to restore.
The backup will be performed logical.
Version pbm: v2.3.0
Version mongodb slightly different on shards: v5.0.15, v5.0.6, v5.0.18
collection volume 200GB