PITR incremental backup fails with stale lock

General

Escalation

General

Escalation

Description

Hi!

From time to time, our PITR incremental backup fails with stale lock:

This happens while a backup is running (`pbm backup`).

A restart of all the pbm-agent plus a full backup fixes this temporarily but it's not sustainable when dealing with bigger number of clusters.

Thanks!

Environment

Ubuntu 18.04
MongoDB 4.2.12

AFFECTED CS IDs

CS0019129

Smart Checklist

Activity

Show:

Sveta Smirnova August 27, 2021 at 12:43 PM

Bug is not repeatable in 1.6.0 after fix for

Rafael Galinari July 15, 2021 at 2:15 PM
Edited

Hi Team, just for the records. I could not reproduce that. Tests I have done so far:

1) Started a pbm backup with PITR in execution. Result: the backup never started with the msg "another operation in progress" (which is expecter)

2) PITR tried to start a job when a PBM snapshot was ongoing: looking at the agent's output, the lock was never granted and the backup never started

I am wondering why a backup set is marked with that error of stale lock even though no locks are registered on the pbm collections.

Also, agents suppose to resolve locks and if an operation ios holding a lock, a parallel operation should never create entries or hold a "nil" lock.

Cannot Reproduce

Details
Assignee
andrew.pogrebnoi
Reporter
Pedro Albuquerque
Affects versions
1.4.1
1.5.0
Priority
High

Smart Checklist

Created April 19, 2021 at 10:50 AM

Updated March 5, 2024 at 7:04 PM

Resolved August 27, 2021 at 12:43 PM

Configure

PITR incremental backup fails with stale lock

Description

Environment

AFFECTED CS IDs

Smart Checklist

Activity

Sveta Smirnova August 27, 2021 at 12:43 PM

Rafael Galinari July 15, 2021 at 2:15 PMEdited

DetailsAssigneeandrew.pogrebnoiandrew.pogrebnoiReporterPedro AlbuquerquePedro AlbuquerqueAffects versions1.4.11.5.0PriorityHigh

Details

Assignee

Reporter

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Rafael Galinari July 15, 2021 at 2:15 PM
Edited

Details
Assignee
andrew.pogrebnoi
Reporter
Pedro Albuquerque
Affects versions
1.4.1
1.5.0
Priority
High

Smart Checklist