long semaphore wait crash due to ha_commit_low does not commit an empty transaction

Description

Crash log

There are RW-latch waitings in the section semaphores of the innodb status

The RW-latch held by a thread

Reproduce

Deploy one PXC cluster and one replication

connecting 2 clusters by asynchronous replication

running SQL to crash the PXC

You can check the innodb status by the command below in node 1 of PXC, when the output is paused.

Root cause

Because the ha_list is null in ha_commit_low, the thread does not call the ht->commit to pop the thd from wsrep_group_commit_queue.

The thread will keep the first element in the wsrep_group_commit_queue. When the other threads call wsrep_wait_for_turn_in_group_commit, threads will wait on the condition COND_wsrep_group_commit until MySQL crashes

By adding the breakpoints below

Id 21 [ Xid 4302 ] register
Id 15 [ Xid 4303 ] register
Id 16 [ Xid 4304 ] register
Id 17 [ Xid 4305 ] register
Id 21 [ Xid 4302 ] enters ha_commit_low
Id 18 [ Xid 4306 ] register
Id 21 [ Xid 4302 ] ha_list is not null
Id 21 [ Xid 4302 ] innobase_commit
Id 21 [ Xid 4302 ] wait
Id 21 [ Xid 4302 ] leaves
Id 15 [ Xid 4303 ] enters ha_commit_low
Id 16 [ Xid 4304 ] enters ha_commit_low
Id 16 [ Xid 4304 ] ha_list is not null
Id 16 [ Xid 4304 ] innobase_commit
Id 16 [ Xid 4304 ] wait
Id 22 [ Xid 4307 ] register

The 4303 enters the ha_commit_low, but does not call the innobase_commit.
The 4303 does not pop the thread from the wsrep_group_commit_queue.
When 4304 commits, the 4304 waits on the condition.

Environment

None

AFFECTED CS IDs

CS0040273, CS0041591

Activity

Show:
Done

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created October 18, 2023 at 7:39 AM
Updated June 6, 2024 at 8:02 AM
Resolved January 16, 2024 at 5:56 PM