Second node (XXX-pxc-1) always selected as donor

Description

I can't repair the second node of the cluster because it chooses itself as a donor:

 

 

2020-08-06T08:16:31.234988Z 0 [Warning] WSREP: Member 0.0 (api-pxc-1) requested state transfer from 'api-pxc-1', but it is impossible to select State Transfer donor: Host is down 2020-08-06T08:16:31.235053Z 2 [ERROR] WSREP: Requesting state transfer failed: -112(Host is down) 2020-08-06T08:16:31.235088Z 2 [ERROR] WSREP: State transfer request failed unrecoverably: 112 (Host is down). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

(full logs in attachment)

Steps to reproduce:

  1. create 3-node cluster

  2. delete pvc datadir-cluster-pxc-1 and pod cluster-pxc-1

  3. pod cluster-pxc-1 now in CrashLoopBackOff

 

 

 

Environment

None

Attachments

1
  • 06 Aug 2020, 08:42 AM

Smart Checklist

Activity

Tomislav Plavcic September 11, 2020 at 7:11 PM

Hi, thanks for the nice bug report and proposal for the solution!
Here's some background, node-0 is by default writer and when node-0 is being upgraded then the highest node is selected as writer in proxysql and haproxy. The intention here was to remove node-0 and highest node (node-2 in this case) as donors to prevent current writer node to become the donor.
One issue here is that the last comma in the wsrep_sst_donor option is removed with "sed 's/,$//'" because:

It first looks at the nodes specified in the donor list (irrespective of their segment). If no suitable donor is still found, the rest of the donor nodes are checked for suitability only if the donor list has a "terminating-comma".

and this comma is removed here so it just fails. This is not a problem for 5 node cluster where there are more nodes, but with 3 nodes it is because node-1 wants to do sst from itself (since node-0 and node-2 are removed from this option) and we prevented galera to search for other donors.

Serhii Prykhodko August 6, 2020 at 9:21 AM

proposed patch for pxc-configure-pxc.sh:

- DONOR_ADDRESS="$(printf '%s\n' "${PEERS[@]}" "${HOSTNAME}" | sort --version-sort | uniq | grep -v -- '-0$' | sed '$d' | tr '\n' ',' | sed 's/,$//')" + DONOR_ADDRESS="$(printf '%s\n' "${PEERS[@]}" | sort -r --version-sort | uniq | sed '$d' | tr '\n' ',' | sed 's/,$//')"
  1. removed local pod hostname from donor list

  2. allow XXX-pxc-0 to be a donor (removed grep -v)

  3. reversed sort (so for pxc-1 list of donors will be pxc-2,pxc-0)

Done

Details

Assignee

Reporter

Time tracking

5h 10m logged

Fix versions

Affects versions

Priority

Smart Checklist

Created August 6, 2020 at 8:47 AM
Updated March 5, 2024 at 6:09 PM
Resolved September 21, 2020 at 7:16 AM

Flag notifications