MySQL GTID metadata not synced upon full SST
Description
Environment
Activity
5.6
commit f8fccb2cd9406b8c990b887ab61adfa7800713f6
Author: Kenn Takara <kenn.takara@percona.com>
Date: Thu Jun 1 18:21:30 2017 -0700
5.7
commit a18f83b6673f0139e1a3283701458e1c905ae807
Author: Kenn Takara <kenn.takara@percona.com>
Date: Thu Jun 1 18:21:30 2017 -0700
Fix https://perconadev.atlassian.net/browse/PXC-827#icft=PXC-827: MySQL GTID metadata not synced upon full SST
Issue:
The SST did not handle correctly the case when binlog names on the
donor/joiner are different (it was assumed they are the same).
On 5.6, with GTIDs enabled, this would lead to differences in the
gtid_executed table. Note that 5.7 does not rely on the binlog,
but just restores the gtid_executed table.
Solution:
XtraBackup only copies the last binlog (to recover the GTIDs). So
we ensure that the binlog is renamed and copied correctly. This
requires that we send over the name of the binlog on the donor. To
enable this, a change was made to how this data is sent over (instead
of a pure one-line data file, we send a cnf-like file that is parsed
on the joiner to recover the values).
On PXC 5.6 (using PXB 2.4.7)
On the donor, PXB will copy the last binlog (which contains the GTID executed info) along with the backup (the file location doesn't matter).
On the joiner, we get the donor's binlog file in the datadir. The SST script then tries to copy this file to the correct location (however, it assumes that the joiner uses the same name for the binlog file). So when the names are not the same, the copy will fail and the gtids_executed will not be updated.
For 5.7, there was a discussion a while ago about this, for 5.7 PXB does not need the last binlog in order to restore the gtids_executed table.
Verified on 5.6.35
Reproed this. Root cause is that the binlogs from the donor are not what the joiner expects. This can arise due to:
(1) binlogs not in the datadir
The donor only copies the datadir over, does not check for binlogs that are not in the datadir.
(2) 'log-bin-index' is used and has a different name from 'log-bin'
The SST script expects the index file to have the same name
(3) 'log-bin' on the donor is different from 'log-bin' on the joiner
The joiner expects the binlogs to have the same name/path. If they have different names, the files are copied over but are not used by the joiner.
Workaround: ensure that the same names are used for the binlogs.
https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1690398
Tested with Server version: 5.6.35-81.0-56-log Percona XtraDB Cluster (GPL), Release rel81.0, Revision 7f9b6ae, WSREP version 26.20, wsrep_26.20
When new node is added to a cluster, where all nodes are GTID enabled, after successful full SST, the joiner has unset initial GTID position.
Test case:
bootstrap first node with settings:
server-id=1
enforce_gtid_consistency=1
gtid_mode=on
log_slave_updates
log-bin=percona1-bin
execute some writes
start second node with same settings
example result:
percona1 mysql> show global variables like 'gtid%';
------------------------------------------------------------------+
| Variable_name | Value |
------------------------------------------------------------------+
| gtid_deployment_step | OFF |
| gtid_executed | 102cd5f1-0628-ee19-4d29-e8233b126f5f:1-317 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
------------------------------------------------------------------+
5 rows in set (0.00 sec)
percona2 mysql> show global variables like 'gtid%';
-----------------------------+
Variable_name
Value
-----------------------------+
gtid_deployment_step
OFF
gtid_executed
gtid_mode
ON
gtid_owned
gtid_purged
-----------------------------+
5 rows in set (0.03 sec)
execute more writes, and see out of sync positions:
percona1 mysql> show global variables like 'gtid_e%';
-----------------------------------------------------------+
| Variable_name | Value |
-----------------------------------------------------------+
| gtid_executed | 102cd5f1-0628-ee19-4d29-e8233b126f5f:1-448 |
-----------------------------------------------------------+
1 row in set (0.00 sec)
percona2 mysql> show global variables like 'gtid_e%';
-----------------------------------------------------------+
Variable_name
Value
-----------------------------------------------------------+
gtid_executed
102cd5f1-0628-ee19-4d29-e8233b126f5f:1-131
-----------------------------------------------------------+
1 row in set (0.00 sec)
Quick test on PXC 5.7.16 worked well though, GTID was synced properly.