Duplicate auto_increment values set for concurrent sessions on cluster reconfiguration

Description

For a given example table:

During multi concurrent insert test, where primary, auto-inc column is unset and other unique columns have always unique values, occasionally, on cluster resize (like when nodes leaving/joining), there are conflicting auto_increment values assigned occasionally.

With INSERT INTO ... ON DUPLICATE KEY UPDATE, this leads to overwriting unique rows, hence loosing data!

Example binary log event:

Example scripts for test case, error log and binlog attached.

Reproduced on both PXC 5.7.19 and 5.7.21, in 3-node cluster and 5-node cluster.

 

 

Environment

None

AFFECTED CS IDs

224754

Attachments

4

Smart Checklist

Activity

Show:

Krunal Bauskar June 13, 2018 at 10:06 AM

commit aa3fd356ac2fd9e32cf980be2bab830ed7724e09
Merge: d0961888d47 675c863c2ca
Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date: Wed Jun 13 15:32:52 2018 +0530

Merge pull request #650 from percona/pxc-5.6-2128

  • PXC#2128: Duplicate auto_increment values set for concurrent

commit 675c863c2ca7a44b69399ae3793e92c60ba9a62e

Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date: Tue Jun 12 21:04:15 2018 +0530

  • PXC#2128: Duplicate auto_increment values set for concurrent
    sessions on cluster reconfiguration


Issue:


  • node leaving cluster can cause dynamic adjustment to auto-increment
    increment and offset based on nodes in primary component.

  • this can affect the auto-inc series and innodb/mysql will need to
    re-calculate auto-inc series based on updated values.

  • with auto-inc re-configuration, auto-inc series is re-calculated.
    for example with off/inc = 3/3 series is say 27, 30, 33, 36, 39,....
    but let's say after 33 off/inc changes to 2/2 then series needs
    to be re-evaluated to adjust to new values.

  • when mysql flow detects decrease in increment it needs to step
    back and re-evaluate the series. for example: 33 is inserted,
    the next value to insert is 36 but if configuration changes
    then standalone server will re-construct 33 (36 - 3 (old off))
    and using 33 as base will project the next number in the series
    to insert.

  • in pxc, with wsrep_auto_increment_control=on, if 33 is inserted
    and off=3/incr=3 (3 node cluster) then next number for the said
    node is 36. pxc can't backtrack old number because it can potentially
    generate number between (34,35) which could conflict with insert
    that is running on node-2,3 that has 34, and 35 reserved.

  • pxc skips back-tracking and instead use the next number in series.
    unfortunately, this part of code though skipped back-tracking or
    readjustment continue to re-calibrate the number.
    while pxc continue to use 36 as base (vs standalone using 33
    so needs re-calibration), pxc flow doesn't need re-calibration
    as pxc retained original number.

  • this re-calibration, caused generation of same value for
    current insert and next value insert.
    (mysql flow identify this use-case and move next-value to
    next legitimate value but only after insert is done on success path).

  • in meantime, if another thread uses this next-value and proceed
    with insert it will end-up inserting next-value (assuming it is next
    value to consume) and current thread that generated curr-value = next-value
    will also insert same value resulting in duplicate-key-error.


Fix:

  • Since pxc doesn't re-adjust the autoinc on decrease in incr, pxc
    should avoid re-calibration.

Krunal Bauskar June 12, 2018 at 1:13 PM

We have successfully reproduced the issue (using a shorter 10 line test-case)
and also found the root-cause of the issue.

In short, on auto-inc configuration change, pxc skips readjust of auto-inc value
as it would potentially conflict with other node range.

(Say next autoinc value=x with incr=y then standalone will use x-y as base value
and re-calibrate it with new auto-inc configuraton that could potentially
generate value in range of (x-y, x-1))

PXC skipped this part of the code but accidentally retained calibration
that is done post readjustment. Since readjustment is skipped there is no
need of re-calibration.

This un-needed re-calibration caused the issue generating duplicate
value for current and next-value.

---------------

Currently I am working on the fix that will be followed by local-testing.
Once ready I will post another update.

Krunal Bauskar June 11, 2018 at 11:46 AM

Just to post an update ...

We have the setup ready and we are investigating the issue.

We suspect that the code that re-aligns (on cluster membership change) the auto-increment series for galera is causing the issue. Infact, the code was consciously edited by upstream to skip some logic that is normally used in standalone mode (of InnoDB). We continue to evaluate if this decision to skip logic make sense.

Unfortunately, it doesn't look like anything is missing but probably a bug in existing flow so need to understand why the said logic was added by upstream.

Iwo Panowicz June 4, 2018 at 3:30 PM

Another test case:

0. Spawn 2 pxc 5.7 instances with:
 - wsrep_retry_autocommit=1,
 - wsrep_auto_increment_control=ON.

 

 

 

 

  1. Run garbd in a loop:

 

 

2. Run on many threads (4 was enough for me):

 

 

after some time it will return

which means that collisions were detected. Also, in binlogs below can be found:

Also, from time to time, it returns 

which in my opinion is correct behaviour.

 

Done

Details

Assignee

Reporter

Labels

Time tracking

3d 4h 4m logged

Affects versions

Priority

Smart Checklist

Created June 4, 2018 at 3:26 PM
Updated March 6, 2024 at 10:31 PM
Resolved June 13, 2018 at 10:07 AM