Node crashes with Transport endpoint is not connected

Description

Related to :

 

Upgrading WSREP version from 26.4.3 to 26.4.12 should solve a crash with some vulnerability scan utilities

Affects 8.0.27, but it should also affects 8.0.30 as wsrep version is the same

Environment

None

AFFECTED CS IDs

CS0034242

Activity

Kamil Holubicki 
September 22, 2023 at 11:25 AM

Post-merge fix will be provided in 8.0.34 by this PR
https://github.com/percona/galera/pull/270

Neil Billett 
September 22, 2023 at 8:23 AM

Hi,

As per the discussion here: https://forums.percona.com/t/occasional-db-crashes-in-pxc-8-0-32-around-remote-endpoint-transport-endpoint-is-not-connected/25158  we believe this is still an issue in PXC 8.0.32.

I’ve been able to replicate the crash with a two node cluster on PXC 8.0.32 using nmap’s tcp connect scan against port 4567 on both nodes.

I’ve got our application connected to <node1> (generating some writeset changes) and if I leave this running from a third host:

…I see a lot of these entries in both node logs as the commands loop:

…and after some time (usually minutes) <node1> falls over e.g:

Hope its helpful!

 

yoann.lacancellera 
March 10, 2023 at 11:08 AM

Thank you, this really helps

 

Sadly nothing was documented in pxc releases about wsrep version, and variables still shows 26.4.3

 

I found out I should have checked for GALERA_VERSIONS file in submodules

 

Kamil Holubicki 
March 3, 2023 at 4:57 PM

Minimal testcase assuming default Galera communication port 4567:

  1. start single node cluster

  2. while true; do nmap -p4567 127.0.0.1; done

It crashes immediately.

The issue was fixed by Galera upstream commit 930c016108d7086b472ad7a8b9d0f6989202b48a and is included in Galera 26.4.12, so:

8.0.27 -> galera 26.4.10 - failure
8.0.28 -> galera 26.4.11 - failure
8.0.29 -> galera 26.4.12 - works fine

Done

Details

Assignee

Reporter

Needs QA

Affects versions

Priority

Created March 3, 2023 at 1:22 PM
Updated March 6, 2024 at 8:44 PM
Resolved March 3, 2023 at 4:58 PM