Nodes "changing identity" can prevents primary groups
Description
Environment
AFFECTED CS IDs
Activity
Easier steps to reproduce:
Start 2 nodes cluster
Break network connectivity between nodes
Shutdown node 2
Bring the network up again
Start node 2
What happens?
When the network is down, both nodes go into non-primary. So node 1 sees node 2 as partitioned.
Node 2 shutdown. During the clean shutdown, gvwstate.dat file is deleted.
When Node 2 is started after the clean shutdown, its identity changes (no gvwstate.dat file
Node 1 sees Node 2 joining the cluster:
gmcast layer detects that the identity changed
evs layer does not keep this information
pc layer is notified about node 2 joining with new identity, but it still keeps the old identity and things this node is partitioned. On this layer there is no link between old and new identity, so it does not know it is the same node
Solution: pc layer has to be notified about node identity change.
About "stop pxc-node3":
it used to reproduce fairly easily on 5.7 because nodes were aborting and shutting down when non-primary, which is apparently not that frequent in pxc 8.0 anymore.
Though, it still happens to get stopped in some productions due to automation (e.g puppet), manual restarts, if a SST got cancelled, and probably for other reasons I am still searching.
So, this does not seem like a stretch to me to include this "stop"
Reproduction:
1. install regular pxc on docker
https://docs.percona.com/percona-xtradb-cluster/8.0/docker.html
With all the steps, no need any modification
2. break network
docker network disconnect pxc-network pxc-node2 && docker network disconnect pxc-network pxc-node3 && sleep 4 && docker network connect pxc-network pxc-node2 && docker stop pxc-node3 && sleep 10 && docker network connect pxc-network pxc-node3 && docker start pxc-node3
=> disconnect network on pxc node2 + node3
Then, after some time to let node1 go non-primary, reconnect node2. It will keep non primary
view (view_id(NON_PRIM,0a41a9f8-ace4,8)
memb {
0a41a9f8-ace4,0
14d9918d-b408,0
}
joined {
}
left {
}
partitioned {
2740f256-b5af,0
}
)
2023-11-17T09:28:16.952178Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
2023-11-17T09:28:16.952348Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2023-11-17T09:28:16.952384Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
2023-11-17T09:28:16.952532Z 10 [Note] [MY-000000] [Galera] Maybe drain monitors from 23 upto current CC event 23 upto:23
2023-11-17T09:28:16.952584Z 10 [Note] [MY-000000] [Galera] Drain monitors from 23 up to 23
2023-11-17T09:28:16.952647Z 10 [Note] [MY-000000] [Galera] ================================================
View:
id: ffbdd994-852a-11ee-984f-a2ee0016e90a:23
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(2):
0: 0a41a9f8-852b-11ee-ace4-46908b37ff0a, b25c2c75f7f0
1: 14d9918d-852b-11ee-b408-bf2da85a5281, unspecified
Alternative: reconnecting network at the same time on node2 and 3 would enable "merge quorum"
Stop node3, reconnect network, restart it
Alternative: just reconnecting network can enable merge quorum again in this case
All nodes will stay non primary
2023-11-17T09:28:36.685042Z 0 [Note] [MY-000000] [Galera] remote endpoint ssl://172.18.0.4:4567 changed identity 2740f256-852b-11ee-b5af-fa54eddf6d79 -> af558e4f-852b-11ee-b064-fa529c359b49
2023-11-17T09:28:37.186202Z 0 [Note] [MY-000000] [Galera] declaring 14d9918d-b408 at ssl://172.18.0.3:4567 stable
2023-11-17T09:28:37.186253Z 0 [Note] [MY-000000] [Galera] declaring af558e4f-b064 at ssl://172.18.0.4:4567 stable
2023-11-17T09:28:37.187867Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,0a41a9f8-ace4,9)
memb {
0a41a9f8-ace4,0
14d9918d-b408,0
af558e4f-b064,0
}
joined {
}
left {
}
partitioned {
2740f256-b5af,0
}
)
...
2023-11-17T09:28:37.188392Z 10 [Note] [MY-000000] [Galera] ================================================
View:
id: ffbdd994-852a-11ee-984f-a2ee0016e90a:23
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(3):
0: 0a41a9f8-852b-11ee-ace4-46908b37ff0a, b25c2c75f7f0
1: 14d9918d-852b-11ee-b408-bf2da85a5281, unspecified
2: af558e4f-852b-11ee-b064-fa529c359b49, unspecified
Now, restart node3 in loop will make it duplicated
docker restart pxc-node3
2023-11-17T09:30:04.794821Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,0a41a9f8-ace4,12)
memb {
0a41a9f8-ace4,0
14d9918d-b408,0
}
joined {
}
left {
}
partitioned {
2740f256-b5af,0
af558e4f-b064,0
c2c3810d-b787,0
cecda587-bd53,0
}
)
nodes then won't forget about the last one, for some reason it's not cleaned:
$ sudo docker logs pxc-node1 2>&1 | grep "reconnecting.*attempt"
2023-11-17T09:26:45.762170Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to 14d9918d-b408 (ssl://172.18.0.3:4567), attempt 0
2023-11-17T09:26:45.762484Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to 2740f256-b5af (ssl://172.18.0.4:4567), attempt 0
2023-11-17T09:28:14.289396Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to 14d9918d-b408 (ssl://172.18.0.3:4567), attempt 0
2023-11-17T09:28:14.290042Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to 2740f256-b5af (ssl://172.18.0.4:4567), attempt 0
2023-11-17T09:30:02.323458Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 0
2023-11-17T09:32:05.356833Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 30
2023-11-17T09:34:07.891571Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 60
2023-11-17T09:36:09.946593Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 90
2023-11-17T09:38:10.005537Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 120
2023-11-17T09:40:10.057834Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 150
2023-11-17T09:42:10.114695Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 180
2023-11-17T09:44:10.173636Z 0 [Note] [MY-000000] [Galera] (0a41a9f8-ace4, 'ssl://0.0.0.0:4567') reconnecting to cecda587-bd53 (ssl://172.18.0.4:4567), attempt 210
Will do, working on it
Please provide steps to reproduce.
After certain issues, nodes logs can be flooded with "changed identity" events
$ grep -r "<redacted>:4567 changed identity" | cut -d ' ' -f6- | sed 's/<redacted>/some_ip/' remote endpoint tcp://some_ip:4567 changed identity 0835f219-4a3c-11ee-9e2e-933d6d2ab80e -> 3b4fbe1c-6657-11ee-9609-c2f41c8b7c49 remote endpoint tcp://some_ip:4567 changed identity 3b4fbe1c-6657-11ee-9609-c2f41c8b7c49 -> 2adbe2b4-6658-11ee-b624-3f651b28ac02 remote endpoint tcp://some_ip:4567 changed identity 2adbe2b4-6658-11ee-b624-3f651b28ac02 -> 053ad113-6659-11ee-8b12-af77b903282b remote endpoint tcp://some_ip:4567 changed identity 053ad113-6659-11ee-8b12-af77b903282b -> b87028d5-6659-11ee-a3d6-53b4d7dff1fd remote endpoint tcp://some_ip:4567 changed identity b87028d5-6659-11ee-a3d6-53b4d7dff1fd -> 6b1503af-665a-11ee-aa8c-7208ea56ed9b remote endpoint tcp://some_ip:4567 changed identity 6b1503af-665a-11ee-aa8c-7208ea56ed9b -> 1dda1b2d-665b-11ee-afb7-6ee866924b61 remote endpoint tcp://some_ip:4567 changed identity 1dda1b2d-665b-11ee-afb7-6ee866924b61 -> d07aed20-665b-11ee-8c06-4fbfab9694e5 remote endpoint tcp://some_ip:4567 changed identity d07aed20-665b-11ee-8c06-4fbfab9694e5 -> 837e50c4-665c-11ee-af4a-ea1aa23d1738 remote endpoint tcp://some_ip:4567 changed identity 837e50c4-665c-11ee-af4a-ea1aa23d1738 -> 3619e66b-665d-11ee-9145-da03a548dad7 remote endpoint tcp://some_ip:4567 changed identity 3619e66b-665d-11ee-9145-da03a548dad7 -> e8fe2a07-665d-11ee-9dbe-6eac1cc1a45b remote endpoint tcp://some_ip:4567 changed identity e8fe2a07-665d-11ee-9dbe-6eac1cc1a45b -> 9ba1b041-665e-11ee-b76d-aa29dd00b6f0 remote endpoint tcp://some_ip:4567 changed identity 9ba1b041-665e-11ee-b76d-aa29dd00b6f0 -> 4e8a6721-665f-11ee-876b-471624b7736a remote endpoint tcp://some_ip:4567 changed identity 4e8a6721-665f-11ee-876b-471624b7736a -> 30dadc7c-6660-11ee-8034-227b5dcc07fc remote endpoint tcp://some_ip:4567 changed identity 0835f219-4a3c-11ee-9e2e-933d6d2ab80e -> 3b4fbe1c-6657-11ee-9609-c2f41c8b7c49 remote endpoint tcp://some_ip:4567 changed identity 3b4fbe1c-6657-11ee-9609-c2f41c8b7c49 -> 2adbe2b4-6658-11ee-b624-3f651b28ac02 remote endpoint tcp://some_ip:4567 changed identity 2adbe2b4-6658-11ee-b624-3f651b28ac02 -> 053ad113-6659-11ee-8b12-af77b903282b remote endpoint tcp://some_ip:4567 changed identity 053ad113-6659-11ee-8b12-af77b903282b -> b87028d5-6659-11ee-a3d6-53b4d7dff1fd remote endpoint tcp://some_ip:4567 changed identity b87028d5-6659-11ee-a3d6-53b4d7dff1fd -> 6b1503af-665a-11ee-aa8c-7208ea56ed9b remote endpoint tcp://some_ip:4567 changed identity 6b1503af-665a-11ee-aa8c-7208ea56ed9b -> 1dda1b2d-665b-11ee-afb7-6ee866924b61 remote endpoint tcp://some_ip:4567 changed identity 1dda1b2d-665b-11ee-afb7-6ee866924b61 -> d07aed20-665b-11ee-8c06-4fbfab9694e5 remote endpoint tcp://some_ip:4567 changed identity d07aed20-665b-11ee-8c06-4fbfab9694e5 -> 837e50c4-665c-11ee-af4a-ea1aa23d1738 remote endpoint tcp://some_ip:4567 changed identity 837e50c4-665c-11ee-af4a-ea1aa23d1738 -> 3619e66b-665d-11ee-9145-da03a548dad7 remote endpoint tcp://some_ip:4567 changed identity 3619e66b-665d-11ee-9145-da03a548dad7 -> e8fe2a07-665d-11ee-9dbe-6eac1cc1a45b remote endpoint tcp://some_ip:4567 changed identity e8fe2a07-665d-11ee-9dbe-6eac1cc1a45b -> 9ba1b041-665e-11ee-b76d-aa29dd00b6f0 remote endpoint tcp://some_ip:4567 changed identity 9ba1b041-665e-11ee-b76d-aa29dd00b6f0 -> 4e8a6721-665f-11ee-876b-471624b7736a remote endpoint tcp://some_ip:4567 changed identity 4e8a6721-665f-11ee-876b-471624b7736a -> 30dadc7c-6660-11ee-8034-227b5dcc07fc
Ultimately giving views like (it's supposed to be a 3 node cluster)
view (view_id(NON_PRIM,30dadc7c-8034,1423) memb { 30dadc7c-8034,0 3f9a75d8-b5a5,0 } joined { } left { } partitioned { 02a66439-a1be,0 052cd3f8-8cf8,0 053ad113-8b12,0 0835f219-9e2e,0 1dda1b2d-afb7,0 1dfbcd84-8faf,0 27c0167e-989d,0 2adbe2b4-b624,0 3619e66b-9145,0 3b4fbe1c-9609,0 4e8a6721-876b,0 52ca4a52-9eb2,0 6b1503af-aa8c,0 6b3f10b3-98a6,0 83612685-96de,0 837e50c4-af4a,0 9ba1b041-b76d,0 9ffacd87-abdb,0 b83c6a85-8fc5,0 b87028d5-a3d6,0 d07aed20-8c06,0 d0ffaaa1-bd33,0 e8fe2a07-9dbe,0 } )
Which only provokes non-primary, even when a majority of nodes should have been able to merge quorum
Translating the above shows how the huge list of "partitioned" are the same node over and over again:
view (view_id(PRIM,node0,1424) memb { node0,0 } joined { } left { } partitioned { node2,0 node2,0 node1,0 node1,0 node1,0 node2,0 node2,0 node1,0 node1,0 node1,0 node1,0 node1,0 node2,0 node1,0 node2,0 node2,0 node1,0 node1,0 node2,0 node2,0 node1,0 node1,0 node2,0 node1,0 } )