Allow user to change the number of insertion workers per collection via command line

Description

Problem

Currently PBM does restore with --numParallelCollections=numCPU/2 --numInsertionWorkersPerCollection=10

having 10 workers restoring data for a single collection is likely to cause fragmentation and higher-than-expected disk usage. It could lead to some unexpectedly high CPU % in combination with the numParallelCollections setting.

We should dial that back to match what mongorestore does which is use 1 numInsertionWorkersPerCollection by default, and let the user change it if required by specifying a parameter. Currently it requires changing global with pbm config but it should be able to accept is as a parameter to the restore itself.

 

Solution

  • The default value stays at 10 until we clarify the fragmentation impact on the storage size

  • A new command line parameter is exposed: --num-insertion-workers-per-collection

  • The command line overwrites what’s stored in the config via restore.numInsertionWorkers

 

Acceptance criteria

  • No performance degradation - restore is not slower after the change

Environment

None

Activity

Show:

radoslaw.szulgo December 4, 2024 at 2:04 PM

We left the default at 10 due to clearly visible performance degradation. The unknown is the impact of fragmentation and how much the free disk size shrinks when using more insertion workers. We release 2.8.0 with a default of 10 and the ability to change it via the command line or configuration file. We may change the default back to 1 again in 2.9.0 when we have more data showing the impact of fragmentation.

Done

Details

Assignee

Reporter

Needs Review

Yes

Needs QA

Yes

Needs Doc

Yes

Sprint

Fix versions

Priority

Smart Checklist

Created October 9, 2024 at 2:30 PM
Updated December 12, 2024 at 5:17 PM
Resolved December 5, 2024 at 10:20 AM