Add pbm cancel-restore command
Description
Environment
is blocked by
Activity

radoslaw.szulgo February 4, 2025 at 12:49 PM
we need to conclude first research on alternative communication of PBM before refining this story

radoslaw.szulgo January 21, 2025 at 12:42 PM
- can we leverage “emergency mode” from the other ticket to also be able to cancel physical restore?
(as suggested by )

Boris Ilijic August 28, 2024 at 3:14 PM
The main conclusions after the clarification meeting:
Bugs that PBM potentially has (e.g., an inability to start the restore due to internal lock) will be treated as bugs using separate Jira tickets. If the support team has such bug, they will open new ticket.
This ticket is still valid as a new feature/improvement because customers might want to cancel the restore procedure. Typical use-case can be: customer doesn’t want to wait for the restore procedure to finish, because it is time-consuming (a few hours), and/or the customer wants to restore another backup instead.
We also agreed that the requirement is to have a fully operational cluster after the cancel-restore procedure. User data (dbs/collections) can be in any state (completely or partially deleted, completely or partially restored), but the cluster itself should be operational.
For now, we’ll decrease the priority of this feature, remove it from the next sprint, and keep it in the backlog for later.
Some technical considerations/limitations:
The feature is only feasible for Logical restore. For Physical restore we can not guarantee that the cluster will be operational after the restore is canceled.
Partially deleting/restoring Users and Roles will lead to non-operational cluster, so we need to overcome that issue.
Restore part is done using mongo-tools, so we need to see to which degree we can customize cancellation there.
Remapping feature should be also investigated in terms of cluster stability after cancellation.

radoslaw.szulgo August 22, 2024 at 9:40 AM
to clarify with .
For logical - it should be feasible, but to clarify what’s expected result/state.
For physical - we can eventually terminate agents. What else?

radoslaw.szulgo August 22, 2024 at 9:29 AM
Flag added
Not clear what’s expected. If that’s to work for both - physical and logical?
Problem description
Currently if a restore operation is stuck, it is not clear what the user needs to do. In practice this can happen when restore hasn’t even started due to some issue with the agent for example.
We need a pbm cancel-restore command that actually aborts the operation leaving the cluster in whatever state it is. In practice this means doing
db.getSiblingDB('admin').pbmLock.delete({ type: 'restore' })
Acceptance criteria