StaleConfig error on sharded cluster after successful logical restore

Description

We have sporadic failures on CI test test_PBM-1223.py when after the successful restore test executes db.countDocuments on sharded collection and mongo returns error:

Need to investigate the issue and possible workarounds, docs:

internals

flushRouterConfig

 

Next steps:

  • tries to reproduce it locally

  • investigate the issue and try to find out if there is a workaround

  • than we will see if we need to fix it, right now it seems it affects only PSMDB 5.0

  • decision about fixing

Environment

it fails only on PSMDB 5.0

Activity

Oleksandr Havryliak September 3, 2024 at 10:34 AM

please add to the docs “ensure that balancer and chunks autosplit is disabled”

This can be checked by executing sh.status() the field autosplit:

Oleksandr Havryliak April 9, 2024 at 12:32 PM

Possible resolution is to flush router config after the restore

Oleksandr Havryliak April 9, 2024 at 11:57 AM

Done

Details

Assignee

Reporter

Labels

Found by Automation

Yes

Needs QA

Yes

Needs Doc

Yes

Sprint

Affects versions

Priority

Smart Checklist

Created March 20, 2024 at 3:27 PM
Updated September 5, 2024 at 8:44 AM
Resolved September 5, 2024 at 8:44 AM