Allow user to change the number of insertion workers per collection via command line
General
Escalation
General
Escalation
Description
Problem
Currently PBM does restore with --numParallelCollections=numCPU/2 --numInsertionWorkersPerCollection=10
having 10 workers restoring data for a single collection is likely to cause fragmentation and higher-than-expected disk usage. It could lead to some unexpectedly high CPU % in combination with the numParallelCollections setting.
We should dial that back to match what mongorestore does which is use 1 numInsertionWorkersPerCollection by default, and let the user change it if required by specifying a parameter. Currently it requires changing global with pbm config but it should be able to accept is as a parameter to the restore itself.
Solution
The default value stays at 10 until we clarify the fragmentation impact on the storage size
A new command line parameter is exposed: --num-insertion-workers-per-collection
The command line overwrites what’s stored in the config via restore.numInsertionWorkers
Acceptance criteria
No performance degradation - restore is not slower after the change
Environment
None
Activity
Show:
radoslaw.szulgo December 4, 2024 at 2:04 PM
We left the default at 10 due to clearly visible performance degradation. The unknown is the impact of fragmentation and how much the free disk size shrinks when using more insertion workers. We release 2.8.0 with a default of 10 and the ability to change it via the command line or configuration file. We may change the default back to 1 again in 2.9.0 when we have more data showing the impact of fragmentation.
Problem
Currently PBM does restore with
--numParallelCollections=numCPU/2 --numInsertionWorkersPerCollection=10
having 10 workers restoring data for a single collection is likely to cause fragmentation and higher-than-expected disk usage. It could lead to some unexpectedly high CPU % in combination with the
numParallelCollections
setting.We should dial that back to match what mongorestore does which is use 1
numInsertionWorkersPerCollection
by default, and let the user change it if required by specifying a parameter. Currently it requires changing global withpbm config
but it should be able to accept is as a parameter to the restore itself.Solution
The default value stays at 10 until we clarify the fragmentation impact on the storage size
A new command line parameter is exposed: --
num-insertion-workers-per-collection
The command line overwrites what’s stored in the config via restore.
numInsertionWorkers
Acceptance criteria
No performance degradation - restore is not slower after the change