PITR incremental backup fails with stale lock

Description

Hi!

From time to time, our PITR incremental backup fails with stale lock:

This happens while a backup is running (`pbm backup`).

A restart of all the pbm-agent plus a full backup fixes this temporarily but it's not sustainable when dealing with bigger number of clusters.

Thanks!

Environment

  • Ubuntu 18.04

  • MongoDB 4.2.12

AFFECTED CS IDs

CS0019129

Smart Checklist

Activity

Show:

Sveta Smirnova August 27, 2021 at 12:43 PM

Bug is not repeatable in 1.6.0 after fix for

Rafael Galinari July 15, 2021 at 2:15 PM
Edited

Hi Team, just for the records. I could not reproduce that. Tests I have done so far:

1) Started a pbm backup with PITR in execution. Result: the backup never started with the msg "another operation in progress" (which is expecter)

2) PITR tried to start a job when a PBM snapshot was ongoing: looking at the agent's output, the lock was never granted and the backup never started

 

I am wondering why a backup set is marked with that error of stale lock even though no locks are registered on the pbm collections.

Also, agents suppose to resolve locks and if an operation ios holding a lock, a parallel operation should never create entries or hold a "nil" lock.

Cannot Reproduce

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created April 19, 2021 at 10:50 AM
Updated March 5, 2024 at 7:04 PM
Resolved August 27, 2021 at 12:43 PM