Cluster hangs with DDL operations during backup
Description
Environment
Smart Checklist
Activity

Aaditya Dubey March 30, 2022 at 2:00 PM
Hi ,
Thank you for the report.
Cluster is hanged so it will impact DMLs as well and same should be fixed by improvement.

Freegle Geeks March 25, 2022 at 3:19 PM
Hi Aaditya,
Thanks for your reply, and for doing a repro scenario.
I agree that looks related. That says "the donor (or a node used for backups) completely blocked for writes". This suggests that it is only going to affect a subset of nodes. But as this example shows any node that is doing DMLs can get blocked, and therefore the whole cluster is likely to lock up.
Could you perhaps amend to indicate that?
Hopefully the fix for that will help.

Aaditya Dubey March 25, 2022 at 3:14 PM
Hi,
Thank you for the report,
I've tried reproducing the issue and able to reproduce the same, please find my test case below:
However this issue is known and getting fixed by this improvement
Is there any way to avoid this?
If possible then please run backup when there is no DDL/DCL running.
Thanks,
Aaditya Dubey
Details
Details
Assignee
Reporter

Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Three machine cluster running v8.0.26-16.1.
First server A is doing a full backup. No direct read/write load - it's used as a spare and to perform backups.
Second server B has a mix of read/modify ops.
Third server C has a mix of ops plus some DDL ops such as DROP/CREATE TABLE.
What seems to happen is:
DROP TABLE op on C gets performed and then is trying to replicate it out.
On A this hangs waiting for the backup lock. The backup takes a long time, as of course it could.
Once this has happened enough, all slave threads on A are busy.
Therefore all no modification ops on B or C can complete because they can't replicate out.
Outstanding threads on B and C rapidly reach thousands.
This means the cluster is pretty much unusable.
Is there any way to avoid this?