Allow selective collection restore with name remapping

General

Escalation

General

Escalation

Description

Problem description

<Clearly define the issue or challenge the epic seeks to address, outlining the impact on users or the system>

There are many situations where conveniently analyzing the differences between two database installations is extremely handy. For example, you have a problem with your production system. All you know about it is that the problem lies somewhere in the database, but WHERE in the database? It worked fine yesterday, so if you compare the current database with yesterday’s backup, you’ll most likely discover the problem. Or worse, an application or database update has removed vital configuration from the system, and it’s now crashing. If you have a backup, you could easily restore a collection and be able to see how your data is changing - this may reduce troubleshooting significantly! It lets you gain insight into your data and provides confident control over the changes when you can see them explicitly and edit them directly. Unfortunately, with PBM, you can only perform a selective restore into a new database - which is an additional effort and cost. Restoring the collection into the same database would require a new name for the collection - and that’s not possible currently.

In the case of restoring a single collection (e.g. after a user mistakenly deletes some data), we want to be able to restore it with a different name to compare it with the existing collection. e.g.

testcol vs testcol_restored

A user has a non-sharded replicaSet. A workaround exists, but it involves using mongorestore and installing an additional package, “s2 compressor”. The workaround doesn’t work for shared deployments. To use PITR, a user has to perform some manual scripting.

Solution hypothesis

<Describe the high-level approach or plan for addressing the problem, focusing on how the proposed changes will resolve the issue>

PBM should support a name-remapping option to handle such an operation within the PBM restore operation automatically.

We accept that the transactions are broken by selective backup or restore usage.

Functional and non-functional requirements

<Specify the system’s capabilities and behaviors needed for the solution (functional), along with performance, security, and usability constraints (non-functional)>

Restoring a sharded collection can be implemented in the 2nd phase
Restoring a time-series collection and a view into a collection can be implemented in the 2nd phase
Indexes are copied with the new collection automatically
Point-in-time-recovery (PITR) via oplog replay is supported
--drop option to drop the collection if it exists already
Restore should fail with a proper message when a collection already exists and suggests using --drop or another name.

Success criteria

<List the conditions that must be met for the epic to be considered complete and for the solution to be accepted by stakeholders>

A collection comparison is doable - for example, via
We get the feedback from DELL that it works reliably

Competition

`mongorestore`, OpsManager, Cloud Manager

To restore a single database or a collection (or specific documents) from a snapshot, one can use the Queryable Backup to export a single database or collection to restore to the target deployment. For example:
mongorestore --port <port> --db <destination database> --collection <collection-name> <data-dump-path/dbname/collection.bson> --drop

One can include --drop to drop the collection in the destination cluster if the collection already exists.

Dependencies

<Identify other projects, teams, or tasks the epic relies on or is linked to, ensuring smooth execution and integration>

The solution depends on mongorestore. When we eliminate that dependency, we need to adjust the solution to support the sharded cluster.

Scope

MVP

restore unsharded collection from replica set cluster

GA:

restore sharded collection from a sharded cluster
restore timeseries collection

Environment

None

AFFECTED CS IDs

CS0046712

Child work items

100% Done

Linked work items

relates to

PBM-690

restore with a different db name

Web links

https://www.notion.so/percona/Selective-collection-restore-with-name-remapping-174674d091f380369959e1ade810ebe7?pvs=4

Activity

radoslaw.szulgo
October 22, 2024 at 11:37 AM

Tasks:

tweak pbm restore to pass input collection and output collection (restore phase)
adjust oplog and PITR to support collection rename

Error handling:

don’t allow to overwrite a collection - error should be thrown if collection exists

radoslaw.szulgo
October 22, 2024 at 11:33 AM

MVP

restore unsharded collection

GA:

restore sharded collection
restore timeseries collection

radoslaw.szulgo
September 19, 2024 at 9:16 AM

When we get rid of mongotools this will be easier. Still not easy as we need to keep mapping of old collection UUID and new collection UUID so we can correctly replay oplog.

radoslaw.szulgo
September 17, 2024 at 12:31 PM

We’d rather not start it this year.

Aaditya Dubey
May 7, 2024 at 7:19 AM

Thank you for the report and feedback.

Resize issue view side panel

Done

Details

Assignee

radoslaw.szulgo

Reporter

Ivan Groenewold

Labels

cs-tag-012reviewed

Needs QA

Yes

Needs Doc

Yes

Needs Packaging

Fix versions

2.8.0

Priority

Urgent

Created May 6, 2024 at 4:23 PM

Updated January 28, 2025 at 10:01 AM

Resolved January 14, 2025 at 2:15 PM