No BF-abort but 'MDL conflict ... solved by abort' printed
Description
Environment
Activity

Kamil Holubicki October 12, 2023 at 12:58 PMEdited
Hi ,
Thank you for providing me with the log files.
What I see inside is this pattern:
It happens when we do something like this:
node_1:
node_2:
If you are lucky enough and a query from node_2 is applied just when SELECT on node_1 is active/running, you will see this SELECT in the log.
If SELECT is finished (note that the transaction still holds MDL locks) the node_1's transaction will be aborted, but in logs, we will see something like this:
What happened is that your local transaction was aborted by a replicated transaction, which is expected.
Case 2:
node_1, session 1:
node_1, session 2:
This time, session 2 will wait for session 1 to finish it's transaction. In logs you will see:
In this case log is wrong. Nothing was aborted, session 2 waits on MDL lock for session 1 to finish.

Arkadiusz Petruczynik October 11, 2023 at 11:05 AM
Ad. 1 During normal workload.
Ad. 2 Every time the application creates views. I am attaching the error log before the node disconnected.
Ad. 3 Similar operations can be performed based on the attached log.
Ad. 4,5 We will try to do it ASAP in a test environment. For now, in production, we have redirected the application creating views (we only have one) to the Percona 8.0.34 standalone server.

Kamil Holubicki October 11, 2023 at 8:40 AMEdited
Hi ,
I'm not sure if this is the return of as I can't reproduce it using those steps to reproduce. But it looks similar, indeed.
A few questions:
Does it happen during the upgrade of the node, or during normal workload?
How often does it happen?
Are you able to provide some deterministic steps to reproduce the issue?
Could you start the server with wsrep_debug=1 and collect the logs around this issue occurrence?
Do you see any abnormal behavior after this log? Something doesn't work, crashes, asserts, etc?
I analyzed the code a bit and here are my findings:
It seems that the message itself is wrong and scares people when it shouldn't, saying that the ticket is solved by abort. The message is printed from here.] It is printed when
wsrep_handle_mdl_conflict() returns false. If we look inside wsrep_handle_mdl_conflict(), we see that returning false doesn't necessarily mean aborting the victim transaction. Actually, this function always returns false , but not always BF-aborts. It means that the requestor thread, after calling wsrep_handle_mdl_conflict(), needs to wait for MDL lock.
Details
Assignee
UnassignedUnassignedReporter
Arkadiusz PetruczynikArkadiusz PetruczynikNeeds Review
YesNeeds QA
YesNoneNonePriority
Low
Details
Details
Assignee
Reporter

Needs Review
Needs QA
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

https://forums.percona.com/t/another-mdl-conflict-db-table-ticket-10-solved-by-abort-issue/25517
We have the same problem after updating cluster from version 8.0.32-24.2 to version 8.0.33-25.
Please note that we do not use the garbd package as described in
https://docs.percona.com/percona-xtradb-cluster/8.0/release-notes/8.0.33-25.upd.html