Alter table online results in "duplicate key" error on the primary key (only index)

Description

There have been many bugs reported here and in MySQL bugs regarding duplicate key error while doing an online alter table operation on tables with a primary and unique keys indexes. Here’s the bug is not as easy to reproduce but involves ONLY the primary key and includes an atomic sequence that cannot create duplicate. It seems to be related to pages splits/merges.

The table definition is quite simple:

k is a UUID value from a set of possible values, version is a sequence starting at 1, value is just a random sized blob. The attached repro_bug.go program generates a set of 20k k values and then starts inserting rows with 4 threads. There are also 2 threads doing deletes, deleting all versions of a randomly chosen key smaller than the current one. I also attached a small bash script to repeatedly run the go program. In my tests the error occurs about 3% of the time. When the error occurs, you’ll see something like:

Over the 90s duration of a run a bit more than 400k rows are inserted. The current sequence printed when an error is encountered. At the moment of the crash, the sequence was at 411974 while the row causing the duplicate key had a sequence of 329982. That means the offending row was inserted nearly 18s before the alter was execution (compaction). That was way before the alter was executed.

The server was running with:

  • Innodb_doublewrite = 0

  • innodb_flush_log_at_trx_commit = 0

  • innodb_buffer_pool_size = 4G

  • sync_binlog = 0

As a final word, I suspect it affects many versions, not only 8.0.35, but I only tested against 8.0.35.

Environment

None

AFFECTED CS IDs

CS0046644

Attachments

2

Smart Checklist

Activity

Show:

Dmitry Lenev September 3, 2024 at 3:04 PM

Fix has been pushed into 8.0.39 and 8.4.2 release trees.

Dmitry Lenev August 1, 2024 at 3:15 PM

Tentative fix for this issue has been submitted as a contribution to Upstream.

Dmitry Lenev July 4, 2024 at 11:47 AM

Reported issue to the Upstream as a bug against MySQL 8.0.38

Dmitry Lenev July 4, 2024 at 8:27 AM

Hello!
After investigation I was able to produce deterministic test case for the problem using DEBUG flag and a new DEBUG_SYNC point.
To reproduce the problem one needs to apply the below patch first:


And then run the following test case against debug build of the server:


Initial analysis shows that the issue seem to be related to the way in which we save/restore cursor positions when performing “parallel” table scan during inplace ALTER TABLE index rebuild phase.

Aaditya Dubey June 3, 2024 at 10:20 AM

Hi

Thank you for the update.

Done

Details

Assignee

Reporter

Labels

Needs QA

Yes

In progress time

6.6

Time tracking

No time logged2w 1d 5h remaining

Sprint

Priority

Smart Checklist

Created April 25, 2024 at 8:27 PM
Updated December 23, 2024 at 11:39 AM
Resolved September 3, 2024 at 3:04 PM