Modify wsrep_ignore_apply_errors variable default to restore 5.x behavior

Description

On 5.7 and previous versions, an inconsistency would crash a node with the following messages:

2020-05-15T21:13:10.839727Z 2 [Note] WSREP: Applier statement rollback needed
2020-05-15T21:13:10.879650Z 2 [ERROR] WSREP: Failed to apply trx: source: 8da83f2a-96f0-11ea-a319-a6f3c0726787 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 8 trx_id: 74 seqnos (l: 5, g: 7, s: 6, d: 6, ts: 7874185270904679)
2020-05-15T21:13:10.879705Z 2 [ERROR] WSREP: Failed to apply trx 7 4 times
2020-05-15T21:13:10.879717Z 2 [ERROR] WSREP: Node consistency compromised, aborting...

 

On PXC8, an inconsistency does NOT crash the node, it only prints the following warnings:

2020-05-15T21:07:15.169339Z 2 [Warning] [MY-000000] [WSREP] Ignoring error 'Can't find record in 't1'' on Delete_rows event. Error_code: 1032
2020-05-15T21:07:15.169409Z 2 [Warning] [MY-010584] [Repl] Slave SQL: Could not execute Delete_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;
the event's master log FIRST, end_log_pos 0, Error_code: MY-001032

 

How to reproduce:

node1:

create database test;

use test;

create table t1 (id int auto_increment primary key);

set sql_log_bin = 0 ; # introduce inconsistency

insert into t1 values (1);

set sql_log_bin = 1;

delete from t1 where id = 1;

 

Check error.log on node 2. On Pxc 5.7 this test shows inconsistency and shutdowns node2, while on PXC 8 it just prints a warning.

Node 2 strict mode:

PXC: root@localhost ((none)) > show global variables like '%strict%';
-------------------------------+

Variable_name

Value

-------------------------------+

innodb_strict_mode

ON

pxc_strict_mode

ENFORCING

-------------------------------+
2 rows in set (0.00 sec)

 

Suggested fix, same behavior as 5.7, i.e if inconsistency detected, shut down node. Or document new behavior as a behavior change on PXC 8

 

 

 

 

 

 

 

Environment

None

Smart Checklist

Activity

Show:

Jira Bot June 1, 2020 at 1:55 PM

To:
CC:

Hi, I'm jira-bot, Percona's Jira automation tool. I've detected that someone from
Percona has made an edit to the Summary field of an issue that you reported.

I'm not sentient (yet) so I'm not sure whether the person fixed a typo, changed
a few words, or completely rewrote the text. In any case, it is Percona Engineering's
intention to make the Summary and Description of an issue as accurate as possible
so that we're fixing the actual problem you're encountering, and to avoid
misunderstandings about symptoms and causes.

If the current Summary does not accurately reflect the problem you are reporting,
or if you feel the change was otherwise inappropriate in some way, please add a
new comment explaining things and we'll address it as soon as we can.

This message will be added only once per issue, regardless of how many times
the Summary is edited.

message-code:summary-edited

Marcelo Altmann May 20, 2020 at 7:20 PM
Edited

Analyzes

Galera4 (I think that as part of the work for inconsistency voting) added the ability to controls the behavior when inconsistency is detected.

We don't have a voting at the moment, but we do have the option to ignore apply errors.

There is an enum that defines the current options https://github.com/percona/percona-xtradb-cluster/blob/8.0/sql/wsrep_mysqld.h#L119 :

enum enum_wsrep_ignore_apply_error { WSREP_IGNORE_ERRORS_NONE = 0x0, WSREP_IGNORE_ERRORS_ON_RECONCILING_DDL = 0x1, WSREP_IGNORE_ERRORS_ON_RECONCILING_DML = 0x2, WSREP_IGNORE_ERRORS_ON_DDL = 0x4, WSREP_IGNORE_ERRORS_MAX = 0x7 };

This is controlled by https://github.com/percona/percona-xtradb-cluster/blob/8.0/sql/sys_vars.cc#L7815 :

static Sys_var_uint Sys_wsrep_ignore_apply_errors( "wsrep_ignore_apply_errors", "Ignore replication errors", GLOBAL_VAR(wsrep_ignore_apply_errors), CMD_LINE(REQUIRED_ARG), VALID_RANGE(WSREP_IGNORE_ERRORS_NONE, WSREP_IGNORE_ERRORS_MAX), DEFAULT(7), BLOCK_SIZE(1));

The default to 7 (WSREP_IGNORE_ERRORS_MAX) while in all 5.X series there was no way to control it and we use WSREP_IGNORE_ERRORS_NONE = 0x0.

Current behavior will convert any inconsistency error to warning and abortion will not happen. 

Done

Details

Assignee

Reporter

Needs Review

Yes

Needs Doc

Yes

Time tracking

1d 5h 47m logged40m remaining

Fix versions

Affects versions

Priority

Smart Checklist

Created May 15, 2020 at 10:04 PM
Updated March 6, 2024 at 9:32 PM
Resolved June 4, 2020 at 8:42 AM

Flag notifications