GTID inconsistency when joining 2 nodes at once

General

Escalation

General

Escalation

Description

During some testing i discovered a weird issue when trying to join 2 pxc nodes at once to a cluster.

The second node, that has to wait in line for the donor to be available, will use a different GUID for all new transactions after joining.

I can reproduce it every time in my docker testing setup starting with version 8.0.35 by executing these steps:

Bootstrap node1

Start node2 and node3 at the same time

After both are done: check gtid_executed. It will be the same on all 3 servers.

Execute a write query.

Check gtid_executed again, node1 and node2 will match, node 3 will have a new GTIDset.

Config used on all 3 nodes:

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
log-slave-updates=1

skip-host-cache
skip-name-resolve

server-id=1

gtid_mode=on
enforce_gtid_consistency=on

log_bin
binlog_format=row
slave_parallel_workers=1

Nothing special in the docker-compose eiter:

---
version: '3'
networks:
  xdb:
    driver: bridge
services:
  node1:
    container_name: xtradb_node1
    image: percona/percona-xtradb-cluster:8.0
    environment:
      - MYSQL_ROOT_PASSWORD=root
      - MYSQL_DATABASE=galera_test_db
      - MYSQL_USER=galera_test_db
      - MYSQL_PASSWORD=galera_test_db
      - CLUSTER_NAME=pxc-cluster
      - CLUSTER_JOIN=
      - PERCONA_TELEMETRY_DISABLE=1
    volumes:
      - ./8.0-cluster-node-conf.d:/etc/percona-xtradb-cluster.conf.d
      - ./certs:/certs
    networks:
      xdb: {}
  node2:
    container_name: xtradb_node2
    image: percona/percona-xtradb-cluster:8.0
    environment:
      - CLUSTER_NAME=pxc-cluster
      - CLUSTER_JOIN=node1,node2,node3
      - PERCONA_TELEMETRY_DISABLE=1
    depends_on:
      - node1
    volumes:
      - ./8.0-cluster-node-conf.d:/etc/percona-xtradb-cluster.conf.d
      - ./certs:/certs
    networks:
      xdb: {}
  node3:
    container_name: xtradb_node3
    image: percona/percona-xtradb-cluster:8.0
    environment:
      - CLUSTER_NAME=pxc-cluster
      - CLUSTER_JOIN=node1,node2,node3
      - PERCONA_TELEMETRY_DISABLE=1
    depends_on:
      - node1
    volumes:
      - ./8.0-cluster-node-conf.d:/etc/percona-xtradb-cluster.conf.d
      - ./certs:/certs
    networks:
      xdb: {}

I couldn’t reproduce the issue with 8.0.30, and on 8.0.31 both joining nodes were using a different GTIDsets, but this might be caused by the later reverted fix in 8.0.31.
Version 8.0.34 worked fine. 8.0.35 introduces the issue it seems. There was some GTID change mentioned in the release notes that sounds suspicious.

Environment

None

Activity

Show:

Aaditya Dubey

September 17, 2024 at 12:09 PM

Hi @Oliver Dala

Yes, I’m exploring the other way around.

Oliver Dala

September 17, 2024 at 12:08 PM

I don’t know how to help you i’m afraid. Maybe use another type of test environment if this doesn’t work for you? I don’t think the bug itself is docker specific.

Aaditya Dubey

September 17, 2024 at 7:31 AM

Hi @Oliver Dala

Please ignore the query, however, I'm still having trouble with the above error.

Oliver Dala

September 17, 2024 at 7:09 AM

I don’t understand what you mean by that?

Aaditya Dubey

September 16, 2024 at 2:14 PM

Hi @Oliver Dala

Thank you for the updates. Do we really need 180+ inactive threads to simulate the issue?

Resize issue view side panel

Details

Assignee

Aaditya Dubey

Reporter

Oliver Dala

Needs QA

Yes

Affects versions

8.0.35-27 (Q4 2023)

8.0.36-28 (Q1 2024)

Priority

Medium

Created September 11, 2024 at 9:03 AM

Updated April 9, 2025 at 7:34 AM

GTID inconsistency when joining 2 nodes at once

Description

Environment

Activity

Details

Assignee

Reporter

Needs QA

Affects versions

Priority

Flag notifications

Something's gone wrong

Something's gone wrong