Issue when re-joining node after clean shutdown + long hangs
General
Escalation
General
Escalation
Description
Environment
Debian Buster, latest updates, no firewall.
Attachments
10
Smart Checklist
Activity
Show:

Aaditya Dubey March 26, 2024 at 8:35 AM
Hi
Due to other priority tasks, we couldn't take a look further. We will take a look and update you shortly.

Marc Bernard March 22, 2024 at 3:53 PM
It is quite disappointing that nobody even took time to look at the error code to point to why it is crashing.
I understand EOL, but there are still millions of people still using 5.7 and a lot of systems will continue using 5.7 for a while.

Aaditya Dubey January 15, 2024 at 2:59 PM
Hi
Extremely sorry that you couldn't get the required feedback, However i’m working on this to repeat so as it repeats & fixed then you may go to 8.0.
Thank you connecting Percona!

Marc Bernard August 9, 2023 at 5:34 PM
Unfortunately, I have abandoned MySQL 8 and since reverted to 5.7 due to lack of feedback here.

Aaditya Dubey January 19, 2023 at 10:06 AM
Hi ,
Please let me know if issue is persists?
Details
Details
Assignee

Reporter

Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created April 19, 2022 at 8:56 PM
Updated September 7, 2024 at 3:28 PM
I have the following setup for cluster to cluster replication :
(Source Legacy) (Mysql 5.7.36-1debian10) (master for replica) gtid_mode=ON_PERMISSIVE
(Source Cluster) 4 Nodes Galera Cluster (xtradb 8.0.2-16) (node 3 is slave, node 4 is master replica) gtid_mode=ON
(Target Cluster) 3 Nodes Galera Cluster (xtradb 8.0.2-16) (node 3 is slave replica) gtid_mode=ON
Whenever I cleanly shut down a node on target cluster, the node will never re-join cluster on the first try. Every time, it will just hang there, stuck on JOINED, until I restart the node again, then it will take a bit of time and show SYNCHED.
I have reproduced this on two completely independent target clusters. The problem is reproduced every time.
If I turn off replica before attempting a restart, sometimes it will re-join on first attempt, but then the replication will stay stuck for a while before it starts replicating again.
I have not been able to reproduce this on 5.7.36-39
–
Delay between COSED and SYNCHED on 8.0.x (Ignoring 1st attempt that failed):
2022-04-19T*20:04:13*.737230Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 136473017)
. . .
2022-04-19T*20:11:35*.043387Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 136568537)
Delay between CLOSED and SYNCHED on 5.7.x (On 1st attempt):
2022-04-19T*20:23:46*.663530Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3960502)
. . .
2022-04-19T*20:24:17*.521708Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3960515)
Delay between CLOSED and SYNCHED on 8.0.x (with replication stopped, 1st attempt)
2022-04-19T*20:36:15*.172049Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 136852454)
. . .
2022-04-19T*20:36:44*.884421Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 136852456)
. . .
When replication is started again, I get these errors (multiple times) and the replication will not propagate (hangs) until I restart that node again. Those same errors are also present on 1st attempt when replication is kept on.
2022-04-19T20:38:29.305201Z 10 [Warning] [MY-000000] [Galera] trx protocol version: 5 does not match certification protocol version: -1
–
I will post configs and logs in separate update.