MySQL crash due to get a native index from get_mutex_cond in group replication

Description

description

MySQL crashed due to an invalid memory address in inline_mysql_mutex_lock

 

The backtrace is

The invalid address of that in the frame 3 (inline_mysql_mutex_lock) is below:

The main reason is that getting a native index (-1) from an array.

Because the n is -1, the ret is invalid. The n should be always be a positive number.

In the frame 6, the automatic_gtid.sidno is -1.

In MYSQL_BIN_LOG::process_flush_stage_queue, the m_sidno is -1.

This is because group replication sets the m_sidno to -1.

 

Environment

None

AFFECTED CS IDs

CS0044430, CS0045039, CS0052291

Activity

Venkatesh Prasad 
April 22, 2024 at 8:49 AM
(edited)

On code analysis, we can conclude that m_sidno is set to -1 at two places:

  1. In handle_transaction_id():

  2. In Blocked_transaction_handler::unblock_waiting_transactions():

 


1. handle_transaction_id()

When a client thread executes a transaction commit, it first calls the group_replication_before_commit hook for certification where a group gtid is assigned. It will wait in transaction_latch->waitTicket() and the certification is done by the applier thread which sets the waiting thread's tranaction_context structure and signals it to proceed.

If the applier thread doesnt see a seq_number assigned, then it sets m_rollback_transaction=true and m_sidno = -1 and signals the waiting thread to perform a rollback.

 

Case 1: transaction is positively certified, seq_number > 0
Behavior: seq_number is set in THD::transaction_context

Case 2: transaction is negatively certified, seq_number = 0
Behavior: rollback_transaction is set and m_sidno is set to -1.

Conclusion: There is nothing wrong in the handle_transaction_id()


2. unblock_waiting_transactions()

 

Even this makes sure it sets the waiting transaction's m_rollback_transaction before unblocking.

Conclusion: There is nothing wrong in the unblock_waiting_transactions()



However in the core dump analysis, in the transaction_context of the crashing thread, it has the m_rollback_transaction set to true, meaning unblock was successful

So, the only possibility I see is that the client thread passed the certification test and proceeded with commit, but before it reached generate_automatic_gtid(), the other thread may have called the unblock_waiting_transactions() (may be as part of the Plugin_gcs_events_handler::was_member_expelled_from_group()->leave_group_on_failure:leave() ) thereby resulting in the crash.

 

Few Observations:

  1. Member got expelled due to network failure and server was set to ERROR state

  2. New transactions failed with

  3. Existing transactions failed with

  4. In the end there was a problem while unblocking waiting threads as well in unblock_waiting_transactions()

    This is mostly due to the return value of releaseTicket() meaning that the key was not found in the map. (Need to check why)

jinyou.ma 
April 18, 2024 at 2:21 AM

I'm still unable to reproduce this issue. I will send a message to get the cluster environment.

jinyou.ma 
April 16, 2024 at 5:51 AM

I have not found a way to reproduce this issue. I will try tomorrow and provide more information.

On the customer side, the issue occured after the unblock_waiting_transactions

Cannot Reproduce

Details

Assignee

Reporter

Needs QA

Sprint

Affects versions

Priority

Created March 4, 2024 at 4:33 AM
Updated April 14, 2025 at 1:54 PM
Resolved April 14, 2025 at 1:54 PM