Make writeConcern configurable

Description

It seems pbm-agent will always use writeConcern:majority, no way to override it using --mongodb-uri or $PBM_MONGODB_URI. Looks like it is set here, and overrides what is set as connection string: https://github.com/percona/percona-backup-mongodb/blob/main/pbm/pbm.go#L441-L446

 

However, in some cases it would be beneficial to use different writeConcern.

Example from customer case: Customer has two datacenters with three mongod nodes each, plus an arbitrator in a third datacenter. Application uses w:3, and can tolerate loss of any single datacenter.
However, if a datacenter fails, since PBM uses w:majority, backups won't work until replica set is reconfigured.

Environment

None

AFFECTED CS IDs

CS0025351, CS0043183

has to be finished together with

Smart Checklist

Activity

Show:

Ivan Groenewold March 4, 2024 at 3:13 PM

Both read concern and write concern should be configurable by the PBM user. Keep majority as default but allow user to change it. My 2c

Sami Ahlroos February 16, 2024 at 8:05 AM

Hi,
Good point about restores, and I agree w:majority makes sense in that case. I didn’t think of that.

We can’t really predict state of customer's replica set when they get into a situation where they have to restore a backup. But it’s probably fair to say, if you have to restore and your 2nd datacenter is down, you must first reconfigure the replica set so w:majority works.

Boris Ilijic February 15, 2024 at 10:15 AM

Hi ,

Thank you for explaining the example of the customer's use case.

The team has agreed that it should be possible and beneficial to allow less restrictive write concern and not just 'majority', but at the same time, that approach cannot be applied to all PBM operations, mainly restore operation during physical backup.

As you suggested, we can allow to inject write concern option using conn string, example:

I still need to analyze the full impact of this change. However, the idea is to allow less restrictive write concern for 'backup' operation, and still keep restrictive one ('majority') for 'restore' operation. A related thing is that for the restore operation, we simply cannot guarantee that the cluster will not be broken if we apply restore command on the cluster in which all the members are not up and running (especially for physical restore).

It would be great if you can share relevant experience with the mentioned customer. Do they have all cluster members running when they trigger restore? E.g. they can have a majority with 5+1 or even 4+1 members but I guess that they don't do restore in such a case. Thank you for the feedback.

Sveta Smirnova March 7, 2022 at 10:23 AM

is similar to this one. I do not close one of them as a duplicate, because use cases are a bit different. But one who will implement this needs to consider as well.

Done

Details

Assignee

Reporter

Needs QA

Yes

Needs Doc

Yes

Sprint

Fix versions

Affects versions

Priority

Smart Checklist

Created March 4, 2022 at 1:34 PM
Updated July 3, 2024 at 10:09 AM
Resolved May 14, 2024 at 10:30 AM