During a test of $backupCursor, we found the $backupCursor may have oplog holes, below is the log
The $backupCursor returns oplogEnd with {"t":1704373138,"i":4}, and the oplog of {"t":1704373138,"i":3} is not in the backup files
Looking at the implementation of $backupCursor in percona server mongodb, the oplogEnd is fetched from the last record of oplog collections, that may contain holes before that oplog record, since those oplogs are not committed yet.
So here is my question, can the $backupCursor used alone without $backupCursorExtend ? Looks like the $backupCursorExtend will wait for the given timestamp to be committed. Or the $backupCursor should be implemented like $backupCursorExtend to wait the oplogEnd to be safe without holes ?
Environment
None
Activity
Show:
radoslaw.szulgo October 1, 2024 at 8:44 AM
This is probably caused by some race condition that is hard to reproduce and troubleshoot. We'll need weeks to spend on this - thus, closing as on demand.
Jan Mynar September 3, 2024 at 12:12 PM
Flag added
this seems to be very complex issue, we have are not even able to reproduce it now, we have to discuss it agin with Radek.
Jan Mynar June 18, 2024 at 8:51 AM
Edited
refinement comment: The plan is to reproduce the problem, find the root cause and decide if the problem is bug or not.
providing that we would like to start working on that in Q3/2024 = moving this bug to “open” state than
MingTotti Guoming He February 8, 2024 at 8:37 AM
Hi ,
Here are my answers to your questions
My problem is can I use $backupCursor alone without $backupCursorExtend ? From my testing, it seems that will have a risk to lose oplogs in hole before the backup timestamp.
My deployment type is a single replica set with three nodes
The workload is simply insertions of records in a single thread, and another thread performs the $backupCursor
Thanks
Konstantin Trushin February 7, 2024 at 11:49 AM
Hello , sorry for not keeping you updated for so long. We are investigating the issue. In the meantime, it would be useful to know more context. Could you please provide us with the following:
what problem you are trying to solve;
your deployment type: single replica set or multiple shards;
workload type and, if possible, approximate description of the operation that hadn't reached to oplog before backup cursor was created.
During a test of $backupCursor, we found the $backupCursor may have oplog holes, below is the log
The $backupCursor returns oplogEnd with
{"t":1704373138,"i":4}
, and the oplog of{"t":1704373138,"i":3}
is not in the backup filesLooking at the implementation of $backupCursor in percona server mongodb, the oplogEnd is fetched from the last record of oplog collections, that may contain holes before that oplog record, since those oplogs are not committed yet.
So here is my question, can the $backupCursor used alone without $backupCursorExtend ? Looks like the $backupCursorExtend will wait for the given timestamp to be committed. Or the $backupCursor should be implemented like $backupCursorExtend to wait the oplogEnd to be safe without holes ?