Remove any db queries during physical restore

General

Escalation

General

Escalation

Description

For physical restore, we cannot assume that the database nodes are up and available before kicking off the restore. It is very likely the restore is required because the cluster is in a bad shape.

PBM agent currently relies on the local mongod process (and also config server replica set in the case of sharded clusters) and won’t start if it cannot communicate with it/them.

We should still be able to start a restore by reading any data required directly from the configured backup location. Storage-based communication is enough for physical restore since it does not matter which node is primary.

This will require saving the following information as part of backup metadata:

For replica sets:

replica set name
FQDN and port of each member
type of each member (data node or arbiter)

For sharded clusters:

config server replica set name
FQDN and port of all members of the config server replica set
replica set name of each shard
FQDN and port of all members of each shard
type of each member (data node or arbiter)

Environment

None

Activity

Show:

Details

Assignee

Unassigned

Reporter

Ivan Groenewold

Needs QA

Yes

Priority

Medium

Parent

PBM-1335 Physical restore assumes a functioning cluster

Smart Checklist

Created February 27, 2025 at 11:36 AM

Updated March 4, 2025 at 5:10 PM

Configure