Node crash after 60s network failure
General
Escalation
General
Escalation
Description
Environment
None
Smart Checklist
Activity
Show:

Vadim Tkachenko February 10, 2021 at 1:55 PM
Additional information for this bug.
I am able to repeat this with 100% using the following steps:
Form 3 node cluster (with Operator), load sysbench-tpcc data, start sysbench workload, stop node-1 for 60s.
However in the following case am I not able to repeat this case:
Form 1 node cluster, load sysbench-tpcc data, extend cluster size to 3 (data is copied via SST), start sysbench workload, stop node-1 for 60s.
In this case node-1 is able to re-join cluster succsesfully.
my sysbench script for the reference

Vadim Tkachenko February 8, 2021 at 6:56 PM
seems duplicate to https://jira.percona.com/browse/PXC-3437
but hopefully contain more information for reproduction

Vadim Tkachenko February 8, 2021 at 6:54 PM
I am using PXC Operator 1.7.0 with all default deployments.
my chaos definition file:
Done
Details
Details
Assignee

Reporter

Labels
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created February 8, 2021 at 6:45 PM
Updated 4 days ago
Resolved February 12, 2025 at 12:58 PM
I am not sure if this is PXC or Operator related, but I continue some work with Chaos Testing, and in this case I emulate a network failure for 60s for Node1, while continuing load on Node0 and Node2.
My expectation that after 60sec the Node1 will join the cluster back and will perform SST or IST.
Unfortunately I see a node crash while it is trying to join