MDL BF-BF conflict between two applier threads

Description

When executing DML and DDL on the same table, the DML will get a deadlock error.

If the DML does not change the data but matches, it won’t be replicated, for example.

mysql > UPDATE test.t SET d = d LIMIT 1; Query OK, 0 rows affected (0.01 sec) Rows matched: 1 Changed: 0 Warnings: 0

If the table contains a trigger, the DML does not get a deadlock error and will be replicated to other nodes.

  • the trigger

CREATE TRIGGER `t_on_update` AFTER UPDATE ON `t` FOR EACH ROW BEGIN INSERT INTO t_history (`d`) VALUES (NEW.`d`); END
  • the binlog event

#250422 3:27:30 server id 100 end_log_pos 413 CRC32 0x67122373 Table_map: `test`.`t` mapped to number 2519 # has_generated_invisible_primary_key=0 # at 413 #250422 3:27:30 server id 100 end_log_pos 469 CRC32 0x918aad16 Table_map: `test`.`t_history` mapped to number 132 # has_generated_invisible_primary_key=0 # at 469 #250422 3:27:30 server id 100 end_log_pos 513 CRC32 0xb2267405 Write_rows: table id 132 flags: STMT_END_F ### INSERT INTO `test`.`t_history` ### SET ### @1=10853 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1912 /* INT meta=0 nullable=1 is_null=0 */

It requests tables t and t_history, although it only inserts a row into t_history

When other nodes apply the DML and DDL, the applier threads will get an MDL BF-BF conflict.

The applier executing the DML query got SHARED_WRITE MDL on t and t_history, for example.

*************************** 1. row *************************** trx_id: 109693 trx_state: RUNNING trx_started: 2025-04-22 05:02:41 trx_query: NULL lock_type: SHARED_WRITE lock_duration: TRANSACTION lock_status: GRANTED object_type: TABLE object_schema: test object_name: t_history thread_id: 11 USER: system user HOST: COMMAND: Sleep TIME: 3 INFO: NULL *************************** 2. row *************************** trx_id: 109693 trx_state: RUNNING trx_started: 2025-04-22 05:02:41 trx_query: NULL lock_type: SHARED_WRITE lock_duration: TRANSACTION lock_status: GRANTED object_type: TABLE object_schema: test object_name: t thread_id: 11 USER: system user HOST: COMMAND: Sleep TIME: 3 INFO: NULL 2 rows in set, 1 warning (0.00 sec)

Another applier thread executing the DDL will get the MDL BF-BF conflict.

2025-04-22T03:04:02.251070Z 2 [Note] [MY-000000] [WSREP] MDL BF-BF conflict schema: test request: (thd-tid:2 seqno:2551 exec-mode:toi, query-state:exec, conflict-state:committed) cmd-code:3 3 query:alter table test.t engine = innodb) granted: (thd-tid:11 seqno:2552 exec-mode:high priority, query-state:exec, conflict-state:committing) cmd-code:0 167 query:(null)) 2025-04-22T03:04:02.251140Z 2 [Note] [MY-000000] [WSREP] MDL ticket: type: shared write, space: TABLE, db: test, name: t 2025-04-22T03:04:02.251169Z 2 [ERROR] [MY-010119] [Server] Aborting 2025-04-22T03:04:02.251179Z 2 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2025-04-22T03:04:04.251398Z 2 [Note] [MY-000000] [WSREP] Server status change synced -> disconnecting 2025-04-22T03:04:04.251427Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1
  • the processlist

node3 [localhost:30144] {msandbox} ((none)) > show processlist; +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ | 2 | system user | | | Killed | 461 | After create | alter table test.t engine = innodb | 461646 | 0 | 0 | | 11 | system user | | NULL | Killed | 461 | wsrep: replicating and certifying write set | NULL | 461646 | 0 | 0 | | 14 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | +----+-------------+-----------+------+---------+------+---------------------------------------------+------------------------------------+---------+-----------+---------------+ 3 rows in set, 1 warning (0.00 sec)

We saw the issue when we execute pt-online-schema-change.

node3 > show processlist; +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ | Id | User            | Host      | db   | Command | Time | State                                       | Info                                                                                                 | Time_ms | Rows_sent | Rows_examined | +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ |  2 | system user     |           | test | Killed  |  309 | Opening tables                              | CREATE DEFINER=`msandbox`@`localhost` TRIGGER `pt_osc_test_parent_del` AFTER DELETE ON `test`.`paren |  308349 |         0 |             0 | | 11 | system user     |           | NULL | Killed  |  309 | wsrep: committed write set                  | NULL                                                                                                 |  308379 |         0 |             0 | | 12 | system user     |           | NULL | Killed  |  309 | wsrep: replicating and certifying write set | NULL                                                                                                 |  308352 |         0 |             0 | | 13 | system user     |           | NULL | Killed  |  309 | wsrep: committed write set                  | NULL                                                                                                 |  308360 |         0 |             0 | | 17 | msandbox        | localhost | NULL | Query   |    0 | init                                        | show processlist                                                                                     |       0 |         0 |             0 | +----+-----------------+-----------+------+---------+------+---------------------------------------------+------------------------------------------------------------------------------------------------------+---------+-----------+---------------+ 5 rows in set, 1 warning (0.00 sec)

To simplify the reproduction step, I run the ALTER TABLE directly.

reproduce

  • create tables and a trigger

use test; CREATE TABLE t ( i int NOT NULL AUTO_INCREMENT, d int, PRIMARY KEY (i) ) ENGINE=InnoDB ; INSERT INTO t values (1,1); CREATE TABLE t_history like t; \d // CREATE TRIGGER `t_on_update` AFTER UPDATE ON `t` FOR EACH ROW BEGIN INSERT INTO t_history (`d`) VALUES (NEW.`d`); END // DELIMITER ;
  • session 1 executes DML

for i in {1..1000};do { ./n1 -BNe "UPDATE test.t SET d = d LIMIT 1;" } done
  • session 2 executes DDL

for i in {1..1000};do { ./n1 -BNe "alter table test.t engine = innodb;" } done

Environment

None

AFFECTED CS IDs

CS0053911

Activity

jinyou.ma 
4 days ago

Hi , it should not be part of the escalation. Let me create a new one.

jinyou.ma 
last week

The issue can also be reproduced without triggers:

  • create table

USE test; CREATE TABLE t1 ( i int NOT NULL AUTO_INCREMENT, d int, PRIMARY KEY (i) ) ENGINE=InnoDB ; CREATE TABLE t2 like t1; INSERT INTO t2 values (1,1); INSERT INTO t1 values (1,1);
  • Session 1 executes DML

for i in {1..1000};do { ./n1 -BNe "UPDATE test.t2 join test.t1 using (i) SET t2.d = t2.d+1, t1.d = t1.d;" } done
  • Session 2 executes DDL

for i in {1..1000};do { ./n1 -BNe "alter table test.t1 engine = innodb;" } done

Kamil Holubicki 
April 29, 2025 at 6:41 AM

I think the workaround has to be applied only for the time of DDL execution. After that you can get back to wsrep_applier_threads=4

jinyou.ma 
April 25, 2025 at 1:35 AM

There is a workaround by using a single wsrep applier thread.

SET GLOBAL wsrep_applier_threads = 1;

jinyou.ma 
April 29, 2025 at 4:43 AM

, I did a performance test (sysbench oltp_update_index) on my side.

  • SET GLOBAL wsrep_applier_threads = 4

----------|---------|--- MySQL Command Status --|----- Innodb row operation -----|-- Buffer Pool Read -- ---Time---|---QPS---|select insert update delete| read inserted updated deleted| logical physical 04:06:49 | 16082| 0 0 16081 0| 8015 0 8015 0| 257877 0 04:06:50 | 16246| 0 0 16251 0| 8095 0 8095 0| 261113 0 04:06:51 | 14831| 0 0 14830 0| 7358 0 7358 0| 233410 0 04:06:52 | 15147| 0 0 15140 0| 7482 0 7482 0| 240992 0
  • SET GLOBAL wsrep_applier_threads = 1;

----------|---------|--- MySQL Command Status --|----- Innodb row operation -----|-- Buffer Pool Read -- ---Time---|---QPS---|select insert update delete| read inserted updated deleted| logical physical 04:06:53 | 11701| 0 0 11701 0| 5911 0 5911 0| 185024 0 04:06:54 | 10536| 0 0 10533 0| 5288 0 5288 0| 169812 0 04:06:55 | 10405| 0 0 10406 0| 5242 0 5242 0| 154489 0 04:06:56 | 11077| 0 0 11074 0| 5610 0 5610 0| 174756 0 04:06:57 | 11858| 0 0 11859 0| 5958 0 5958 0| 182919 0 04:06:58 | 11866| 0 0 11865 0| 5971 0 5970 0| 185644 0

Please note: the performance impact depends on your specific workload.

 

Checking the status should help identify the flow control behavior.

show status like '%flow%';

Details

Assignee

Reporter

Needs QA

Start date

Time tracking

3d logged

Fix versions

Affects versions

Priority

Created April 22, 2025 at 3:39 AM
Updated 4 days ago

Flag notifications