Physical restore assumes a functioning cluster

Description

Problem description

With the current design, PBM cannot restore a physical backup if the target cluster is broken (e.g., one of the shards is not healthy, one of the shards doesn’t have a primary, etc.). In a real-world scenario, we want to be able to restore a cluster without having to fix anything first. As long as the target hosts are okay, we can restore.

With physical backups, the agents already use the backup storage (e.g. S3 bucket) to coordinate themselves by creating files. We should not rely on the MongoDB replication mechanism or node states in any way for a physical restore.

Solution proposition

  • Introduce an “emergency mode” (at best automatically selected) if a target cluster is broken.

    • User might not know what to check to verify if a cluster is broken and might not know which mode to use

    • User is notified via logs/output that an emergency mode is used as cluster Is unhealthy - display a reason

    •  

  • PBM agents communicate with each other via HTTP protocol instead of MongoDB database

Acceptance Criteria

  • Physical restore works if a shard doesn’t have a primary instance

Documentation

  • Within a restore procedure for physical backups add another section for restoring in emergency mode -

  • The topic should describe a situation when an emergency restoration is needed to be run.

  • It should describe how to run it.

  • Describe (if any) side effects / results when using emergency mode vs a standard way.

Environment

None

Activity

Show:

Boris Ilijic February 27, 2025 at 10:41 AM

Converted it to the epic and linked

Added child tickets to this epic: & .

radoslaw.szulgo September 12, 2024 at 10:12 AM

Moved under the

Jan Mynar September 12, 2024 at 10:02 AM

please create a incubator item for this improvement. And take this topic to the MongoDB leaders forum

Aaditya Dubey June 7, 2024 at 5:30 AM

Hi

Thank you for the report and feedback.

Details

Assignee

Reporter

Labels

Needs QA

Yes

Story Points

Sprint

Priority

Smart Checklist

Created June 6, 2024 at 12:45 PM
Updated February 27, 2025 at 10:42 AM