Allow selective collection restore with name remapping
General
Escalation
General
Escalation
Description
Environment
None
AFFECTED CS IDs
CS0046712
100% Done
relates to
Activity
Show:

radoslaw.szulgo October 22, 2024 at 11:37 AM
Tasks:
tweak pbm restore to pass input collection and output collection (restore phase)
adjust oplog and PITR to support collection rename
Error handling:
don’t allow to overwrite a collection - error should be thrown if collection exists

radoslaw.szulgo October 22, 2024 at 11:33 AM
MVP
restore unsharded collection
GA:
restore sharded collection
restore timeseries collection

radoslaw.szulgo September 19, 2024 at 9:16 AM
When we get rid of mongotools this will be easier. Still not easy as we need to keep mapping of old collection UUID and new collection UUID so we can correctly replay oplog.

radoslaw.szulgo September 17, 2024 at 12:31 PM
We’d rather not start it this year.

Aaditya Dubey May 7, 2024 at 7:19 AM
Hi
Thank you for the report and feedback.
Done
Details
Details
Assignee

Reporter

Labels
Needs QA
Yes
Needs Doc
Yes
Needs Packaging
No
Fix versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created May 6, 2024 at 4:23 PM
Updated January 28, 2025 at 10:01 AM
Resolved January 14, 2025 at 2:15 PM
Problem description
<Clearly define the issue or challenge the epic seeks to address, outlining the impact on users or the system>
There are many situations where conveniently analyzing the differences between two database installations is extremely handy. For example, you have a problem with your production system. All you know about it is that the problem lies somewhere in the database, but WHERE in the database? It worked fine yesterday, so if you compare the current database with yesterday’s backup, you’ll most likely discover the problem. Or worse, an application or database update has removed vital configuration from the system, and it’s now crashing. If you have a backup, you could easily restore a collection and be able to see how your data is changing - this may reduce troubleshooting significantly! It lets you gain insight into your data and provides confident control over the changes when you can see them explicitly and edit them directly. Unfortunately, with PBM, you can only perform a selective restore into a new database - which is an additional effort and cost. Restoring the collection into the same database would require a new name for the collection - and that’s not possible currently.
In the case of restoring a single collection (e.g. after a user mistakenly deletes some data), we want to be able to restore it with a different name to compare it with the existing collection. e.g.
testcol
vstestcol_restored
A user has a non-sharded replicaSet. A workaround exists, but it involves using mongorestore and installing an additional package, “s2 compressor”. The workaround doesn’t work for shared deployments. To use PITR, a user has to perform some manual scripting.
Solution hypothesis
<Describe the high-level approach or plan for addressing the problem, focusing on how the proposed changes will resolve the issue>
PBM should support a name-remapping option to handle such an operation within the PBM restore operation automatically.
We accept that the transactions are broken by selective backup or restore usage.
Functional and non-functional requirements
<Specify the system’s capabilities and behaviors needed for the solution (functional), along with performance, security, and usability constraints (non-functional)>
Restoring a sharded collection can be implemented in the 2nd phase
Restoring a time-series collection and a view into a collection can be implemented in the 2nd phase
Indexes are copied with the new collection automatically
Point-in-time-recovery (PITR) via oplog replay is supported
--drop
option to drop the collection if it exists alreadyRestore should fail with a proper message when a collection already exists and suggests using
--drop
or another name.Success criteria
<List the conditions that must be met for the epic to be considered complete and for the solution to be accepted by stakeholders>
A collection comparison is doable - for example, via
We get the feedback from DELL that it works reliably
Competition
<Provides an overview of competing solutions, tools, or approaches in the market, highlighting their strengths and weaknesses compared to the proposed solution>
mongorestore
, OpsManager, Cloud ManagerTo restore a single database or a collection (or specific documents) from a snapshot, one can use the Queryable Backup to export a single database or collection to restore to the target deployment. For example:
mongorestore --port <port> --db <destination database> --collection <collection-name> <data-dump-path/dbname/collection.bson> --drop
One can include
--drop
to drop the collection in the destination cluster if the collection already exists.Read more:
Dependencies
<Identify other projects, teams, or tasks the epic relies on or is linked to, ensuring smooth execution and integration>
The solution depends on
mongorestore
. When we eliminate that dependency, we need to adjust the solution to support the sharded cluster.Scope
MVP
restore unsharded collection from replica set cluster
GA:
restore sharded collection from a sharded cluster
restore timeseries collection