Cluster hangs with DDL operations during backup

Description

Three machine cluster running v8.0.26-16.1.

  1. First server A is doing a full backup.  No direct read/write load - it's used as a spare and to perform backups.

  2. Second server B has a mix of read/modify ops.

  3. Third server C has a mix of ops plus some DDL ops such as DROP/CREATE TABLE.

What seems to happen is:

  • DROP TABLE op on C gets performed and then is trying to replicate it out.

  • On A this hangs waiting for the backup lock.  The backup takes a long time, as of course it could.

  • Once this has happened enough, all slave threads on A are busy.

  • Therefore all no modification ops on B or C can complete because they can't replicate out.

  • Outstanding threads on B and C rapidly reach thousands.

  • This means the cluster is pretty much unusable.

Is there any way to avoid this?

 

Environment

None

Smart Checklist

Activity

Show:

Aaditya Dubey March 30, 2022 at 2:00 PM

Hi ,

Thank you for the report.
Cluster is hanged so it will impact DMLs as well and same should be fixed by  improvement.

 

Freegle Geeks March 25, 2022 at 3:19 PM

Hi Aaditya,

Thanks for your reply, and for doing a repro scenario.

I agree that looks related.  That says "the donor (or a node used for backups) completely blocked for writes".  This suggests that it is only going to affect a subset of nodes.  But as this example shows any node that is doing DMLs can get blocked, and therefore the whole cluster is likely to lock up.

Could you perhaps amend to indicate that?

Hopefully the fix for that will help.

Aaditya Dubey March 25, 2022 at 3:14 PM

Hi,

Thank you for the report,

I've tried reproducing the issue and able to reproduce the same, please find my test case below:

However this issue is known and getting fixed by this improvement 

Is there any way to avoid this?

If possible then please run backup when there is no DDL/DCL running.

Thanks,
Aaditya Dubey

Duplicate

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created January 27, 2022 at 9:06 AM
Updated October 23, 2024 at 8:15 AM
Resolved March 25, 2022 at 3:15 PM