Automate using a physical backups to seed a lost node.
Description
Environment
AFFECTED CS IDs
Activity

Ez Ulaganathan November 8, 2024 at 3:57 PM
This feature will help to perform a full resync/initial sync one node that is part of a replica set which has been down for a while and oplog does not exist in primary to catchup. This is a very huge database so it will be helpful to restore from incremental type PBM backups.

radoslaw.szulgo September 19, 2024 at 9:11 AM
This is a bigger task—it will take one month or longer. We put it into the backlog and reconsidered it in the context of working on selective physical backup/restore.
Also, this might be better solved with a separate new tool we could build.
Workaround: Copy files independently and join the node into the replicaset.

Dmytro Zghoba September 19, 2024 at 9:02 AM
Technical things to consider:
the target node cannot be part of a replset:
the node should have Primary state (be writable)
the node should not perform replication at the same time
the node cluster time should be not later than the replset
there should not be gap in oplog between replset primary and the join node

radoslaw.szulgo August 22, 2024 at 9:19 AM
Flag added
This needs to be researched on how to solve it.

Ivan Groenewold May 31, 2024 at 2:00 PM
I am reopening this as we are confusing 2 different things. File copy based initial sync is great, but still the ability to seed a new node directly from the backup is useful in most scenarios.
There is a performance impact for the donor node, as well as network bandwidth and data transfer costs factor depending on the cloud provider. It could also be much faster to download from backup storage due to parallel feature.
Details
Assignee
radoslaw.szulgoradoslaw.szulgoReporter
Jean da SilvaJean da SilvaLabels
Needs QA
YesPriority
Medium
Details
Details
Assignee

Reporter

Labels
Needs QA
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

Scenario:
One of the possibilities when using physical backups is to recover a lost node using those files as a seed.
We copy those backup files into a new node.
Add that new node using rs.add().
If the Oplog window from Source still covers the period the backup was taken, this new node will only apply the Oplog difference.
This alternative is useful when we lose a node due to some replication issue, and that node can't catch up with the source anymore, becoming stale.
Instead of triggering an initial-sync which is a resource-consuming operation, we could use a physical backup from PBM to seed that node and let only the oplog difference run.
From upstream documentation:
Sync by Copying Data Files from Another Member
Current Scenario:
PBM does not have any option to seed a lost node by the tool; that must be done manually.
1. What is possible to do is to take a physical backup but use compression methods that are easier to work with, like gzip or even none.
2. Delete the files from the stale instance.
3. Copy the backup files from your storage location to the dbpath on the stale node, adjusting the permission accordingly.(if compressed, must decompress that before).
4. Start the stale node.
At that point, it must apply the oplog difference from the backup period until the last operation from the Source.
Proposed Scenario:
Add an option to perform such a process to seed a lost node using PBM.
It must halt and warn on attempts to restore against PRIMARY nodes.