MDL BF-BF conflict between two applier threads
General
Escalation
General
Escalation
Description
Environment
None
AFFECTED CS IDs
CS0053911
Activity
jinyou.ma 4 days ago
jinyou.ma
4 days ago
Hi @Kamil Holubicki , it should not be part of the escalation. Let me create a new one.
jinyou.ma last week
jinyou.ma
last week
@Kamil Holubicki The issue can also be reproduced without triggers:
create table
USE test;
CREATE TABLE t1 (
i int NOT NULL AUTO_INCREMENT,
d int,
PRIMARY KEY (i)
) ENGINE=InnoDB ;
CREATE TABLE t2 like t1;
INSERT INTO t2 values (1,1);
INSERT INTO t1 values (1,1);
Session 1 executes DML
for i in {1..1000};do {
./n1 -BNe "UPDATE test.t2 join test.t1 using (i) SET t2.d = t2.d+1, t1.d = t1.d;"
}
done
Session 2 executes DDL
for i in {1..1000};do {
./n1 -BNe "alter table test.t1 engine = innodb;"
}
done
Kamil Holubicki April 29, 2025 at 6:41 AM
Kamil Holubicki
April 29, 2025 at 6:41 AM
I think the workaround has to be applied only for the time of DDL execution. After that you can get back to wsrep_applier_threads=4
jinyou.ma April 25, 2025 at 1:35 AM
jinyou.ma
April 25, 2025 at 1:35 AM
There is a workaround by using a single wsrep applier thread.
SET GLOBAL wsrep_applier_threads = 1;
jinyou.ma April 29, 2025 at 4:43 AM
jinyou.ma
April 29, 2025 at 4:43 AM
@Jesse Preiner , I did a performance test (sysbench oltp_update_index) on my side.
SET GLOBAL wsrep_applier_threads = 4
----------|---------|--- MySQL Command Status --|----- Innodb row operation -----|-- Buffer Pool Read --
---Time---|---QPS---|select insert update delete| read inserted updated deleted| logical physical
04:06:49 | 16082| 0 0 16081 0| 8015 0 8015 0| 257877 0
04:06:50 | 16246| 0 0 16251 0| 8095 0 8095 0| 261113 0
04:06:51 | 14831| 0 0 14830 0| 7358 0 7358 0| 233410 0
04:06:52 | 15147| 0 0 15140 0| 7482 0 7482 0| 240992 0
SET GLOBAL wsrep_applier_threads = 1;
----------|---------|--- MySQL Command Status --|----- Innodb row operation -----|-- Buffer Pool Read --
---Time---|---QPS---|select insert update delete| read inserted updated deleted| logical physical
04:06:53 | 11701| 0 0 11701 0| 5911 0 5911 0| 185024 0
04:06:54 | 10536| 0 0 10533 0| 5288 0 5288 0| 169812 0
04:06:55 | 10405| 0 0 10406 0| 5242 0 5242 0| 154489 0
04:06:56 | 11077| 0 0 11074 0| 5610 0 5610 0| 174756 0
04:06:57 | 11858| 0 0 11859 0| 5958 0 5958 0| 182919 0
04:06:58 | 11866| 0 0 11865 0| 5971 0 5970 0| 185644 0
Please note: the performance impact depends on your specific workload.
Checking the status should help identify the flow control behavior.
show status like '%flow%';
Details
Details
Assignee
Kamil Holubicki
Kamil HolubickiReporter
jinyou.ma
jinyou.maNeeds QA
Yes
Start date
May 07, 2025
Time tracking
3d logged
Sprint
Affects versions
Priority
Created April 22, 2025 at 3:39 AM
Updated 4 days ago
When executing DML and DDL on the same table, the DML will get a deadlock error.
If the DML does not change the data but matches, it won’t be replicated, for example.
mysql > UPDATE test.t SET d = d LIMIT 1; Query OK, 0 rows affected (0.01 sec) Rows matched: 1 Changed: 0 Warnings: 0
If the table contains a trigger, the DML does not get a deadlock error and will be replicated to other nodes.
the trigger
CREATE TRIGGER `t_on_update` AFTER UPDATE ON `t` FOR EACH ROW BEGIN INSERT INTO t_history (`d`) VALUES (NEW.`d`); END
the binlog event
#250422 3:27:30 server id 100 end_log_pos 413 CRC32 0x67122373 Table_map: `test`.`t` mapped to number 2519 # has_generated_invisible_primary_key=0 # at 413 #250422 3:27:30 server id 100 end_log_pos 469 CRC32 0x918aad16 Table_map: `test`.`t_history` mapped to number 132 # has_generated_invisible_primary_key=0 # at 469 #250422 3:27:30 server id 100 end_log_pos 513 CRC32 0xb2267405 Write_rows: table id 132 flags: STMT_END_F ### INSERT INTO `test`.`t_history` ### SET ### @1=10853 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1912 /* INT meta=0 nullable=1 is_null=0 */
It requests tables
t
andt_history
, although it only inserts a row intot_history
When other nodes apply the DML and DDL, the applier threads will get an MDL BF-BF conflict.
The applier executing the DML query got
SHARED_WRITE
MDL ont
andt_history
, for example.*************************** 1. row *************************** trx_id: 109693 trx_state: RUNNING trx_started: 2025-04-22 05:02:41 trx_query: NULL lock_type: SHARED_WRITE lock_duration: TRANSACTION lock_status: GRANTED object_type: TABLE object_schema: test object_name: t_history thread_id: 11 USER: system user HOST: COMMAND: Sleep TIME: 3 INFO: NULL *************************** 2. row *************************** trx_id: 109693 trx_state: RUNNING trx_started: 2025-04-22 05:02:41 trx_query: NULL lock_type: SHARED_WRITE lock_duration: TRANSACTION lock_status: GRANTED object_type: TABLE object_schema: test object_name: t thread_id: 11 USER: system user HOST: COMMAND: Sleep TIME: 3 INFO: NULL 2 rows in set, 1 warning (0.00 sec)
Another applier thread executing the DDL will get the
MDL BF-BF
conflict.2025-04-22T03:04:02.251070Z 2 [Note] [MY-000000] [WSREP] MDL BF-BF conflict schema: test request: (thd-tid:2 seqno:2551 exec-mode:toi, query-state:exec, conflict-state:committed) cmd-code:3 3 query:alter table test.t engine = innodb) granted: (thd-tid:11 seqno:2552 exec-mode:high priority, query-state:exec, conflict-state:committing) cmd-code:0 167 query:(null)) 2025-04-22T03:04:02.251140Z 2 [Note] [MY-000000] [WSREP] MDL ticket: type: shared write, space: TABLE, db: test, name: t 2025-04-22T03:04:02.251169Z 2 [ERROR] [MY-010119] [Server] Aborting 2025-04-22T03:04:02.251179Z 2 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2025-04-22T03:04:04.251398Z 2 [Note] [MY-000000] [WSREP] Server status change synced -> disconnecting 2025-04-22T03:04:04.251427Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1
the processlist
node3 [localhost:30144] {msandbox} ((none)) > show processlist; +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ | 2 | system user | | | Killed | 461 | After create | alter table test.t engine = innodb | 461646 | 0 | 0 | | 11 | system user | | NULL | Killed | 461 | wsrep: replicating and certifying write set | NULL | 461646 | 0 | 0 | | 14 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ 3 rows in set, 1 warning (0.00 sec)
We saw the issue when we execute pt-online-schema-change.
node3 > show processlist; +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ | 2 | system user | | test | Killed | 309 | Opening tables | CREATE DEFINER=`msandbox`@`localhost` TRIGGER `pt_osc_test_parent_del` AFTER DELETE ON `test`.`paren | 308349 | 0 | 0 | | 11 | system user | | NULL | Killed | 309 | wsrep: committed write set | NULL | 308379 | 0 | 0 | | 12 | system user | | NULL | Killed | 309 | wsrep: replicating and certifying write set | NULL | 308352 | 0 | 0 | | 13 | system user | | NULL | Killed | 309 | wsrep: committed write set | NULL | 308360 | 0 | 0 | | 17 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ 5 rows in set, 1 warning (0.00 sec)
To simplify the reproduction step, I run the ALTER TABLE directly.
reproduce
create tables and a trigger
use test; CREATE TABLE t ( i int NOT NULL AUTO_INCREMENT, d int, PRIMARY KEY (i) ) ENGINE=InnoDB ; INSERT INTO t values (1,1); CREATE TABLE t_history like t; \d // CREATE TRIGGER `t_on_update` AFTER UPDATE ON `t` FOR EACH ROW BEGIN INSERT INTO t_history (`d`) VALUES (NEW.`d`); END // DELIMITER ;
session 1 executes DML
for i in {1..1000};do { ./n1 -BNe "UPDATE test.t SET d = d LIMIT 1;" } done
session 2 executes DDL
for i in {1..1000};do { ./n1 -BNe "alter table test.t engine = innodb;" } done