Issues

Select view

Select search mode

 
18 of 18

Node crashes with Transport endpoint is not connected

Done

Description

Related to :

 

Upgrading WSREP version from 26.4.3 to 26.4.12 should solve a crash with some vulnerability scan utilities

Affects 8.0.27, but it should also affects 8.0.30 as wsrep version is the same

2023-02-28T13:18:54.794062+01:00 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)          at gcomm/src/gcomm/datagram.hpp:unserialize():133 terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >'   what():  remote_endpoint: Transport endpoint is not connected 2023-02-28T13:19:41.105324+01:00 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation 12:19:41 UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.Build ID: df3732f507bb44de9b2cf240d6ec633fccedccbe Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x20d55fd] /usr/sbin/mysqld(handle_fatal_signal+0x383) [0x1172a83] /lib64/libpthread.so.0(+0xf630) [0x7f56d8736630] /lib64/libc.so.6(gsignal+0x37) [0x7f56d6a21387] /lib64/libc.so.6(abort+0x148) [0x7f56d6a22a78] /lib64/libstdc++.so.6(__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f56d7331a95] /lib64/libstdc++.so.6(+0x5ea06) [0x7f56d732fa06] /lib64/libstdc++.so.6(+0x5ea33) [0x7f56d732fa33] /lib64/libstdc++.so.6(+0x5ec53) [0x7f56d732fc53] /usr/lib64/galera4/libgalera_smm.so(+0x1dbce) [0x7f56c7b9bbce] /usr/lib64/galera4/libgalera_smm.so(+0x93f48) [0x7f56c7c11f48] /usr/lib64/galera4/libgalera_smm.so(+0xa3dc5) [0x7f56c7c21dc5] /usr/lib64/galera4/libgalera_smm.so(+0xa6b6a) [0x7f56c7c24b6a] /usr/lib64/galera4/libgalera_smm.so(+0xaddaf) [0x7f56c7c2bdaf] /usr/lib64/galera4/libgalera_smm.so(+0x8c160) [0x7f56c7c0a160] /usr/lib64/galera4/libgalera_smm.so(+0x1c418e) [0x7f56c7d4218e] /usr/lib64/galera4/libgalera_smm.so(+0x1c42b2) [0x7f56c7d422b2] /lib64/libpthread.so.0(+0x7ea5) [0x7f56d872eea5] /lib64/libc.so.6(clone+0x6d) [0x7f56d6ae99fd]

Environment

None

AFFECTED CS IDs

CS0034242

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created March 3, 2023 at 1:22 PM
Updated March 6, 2024 at 8:44 PM
Resolved March 3, 2023 at 4:58 PM

Activity

Show:

Kamil Holubicki September 22, 2023 at 11:25 AM

Post-merge fix will be provided in 8.0.34 by this PR
https://github.com/percona/galera/pull/270

Neil Billett September 22, 2023 at 8:23 AM

Hi,

As per the discussion here: https://forums.percona.com/t/occasional-db-crashes-in-pxc-8-0-32-around-remote-endpoint-transport-endpoint-is-not-connected/25158  we believe this is still an issue in PXC 8.0.32.

I’ve been able to replicate the crash with a two node cluster on PXC 8.0.32 using nmap’s tcp connect scan against port 4567 on both nodes.

I’ve got our application connected to <node1> (generating some writeset changes) and if I leave this running from a third host:

while true; do nmap -T2 -sT <node1> -p4567; nmap -T2 -sT <node2> -p4567; done

…I see a lot of these entries in both node logs as the commands loop:

[Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected

…and after some time (usually minutes) <node1> falls over e.g:

2023-09-21T16:40:43.106276+01:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected 2023-09-21T16:40:44.811107+01:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected 2023-09-21T16:40:46.527115+01:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected terminate called after throwing an instance of 'std::system_error' what(): remote_endpoint: Transport endpoint is not connected 2023-09-21T16:40:48.235578+01:00 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2023-09-21T15:40:48Z UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=df9f6877fc91c9a71d439f27569eabdef408f622 Server Version: 8.0.32-24.2 Percona XtraDB Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3, wsrep_26.1.4.3 Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x80000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2253a31] /usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x1262d0f] /usr/sbin/mysqld(handle_fatal_signal+0xd8) [0x1262df8] /lib64/libpthread.so.0(+0x12cf0) [0x7f7a1e6f9cf0] /lib64/libc.so.6(gsignal+0x10f) [0x7f7a1caa7aff] /lib64/libc.so.6(abort+0x127) [0x7f7a1ca7aea5] /lib64/libstdc++.so.6(+0x9009b) [0x7f7a1d44909b] /lib64/libstdc++.so.6(+0x9653c) [0x7f7a1d44f53c] /lib64/libstdc++.so.6(+0x96597) [0x7f7a1d44f597] /lib64/libstdc++.so.6(+0x967f8) [0x7f7a1d44f7f8] /usr/lib64/galera4/libgalera_smm.so(+0x922cf) [0x7f7a0f5872cf] /usr/lib64/galera4/libgalera_smm.so(+0x92d7c) [0x7f7a0f587d7c] /usr/lib64/galera4/libgalera_smm.so(+0xa6885) [0x7f7a0f59b885] /usr/lib64/galera4/libgalera_smm.so(+0xb3c98) [0x7f7a0f5a8c98] /usr/lib64/galera4/libgalera_smm.so(+0x8e400) [0x7f7a0f583400] /usr/lib64/galera4/libgalera_smm.so(+0x8e6b3) [0x7f7a0f5836b3] /usr/lib64/galera4/libgalera_smm.so(+0x1c15ae) [0x7f7a0f6b65ae] /usr/lib64/galera4/libgalera_smm.so(+0x1c16d6) [0x7f7a0f6b66d6] /lib64/libpthread.so.0(+0x81ca) [0x7f7a1e6ef1ca] /lib64/libc.so.6(clone+0x43) [0x7f7a1ca92e73] You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash.

Hope its helpful!

 

yoann.lacancellera March 10, 2023 at 11:08 AM

Thank you, this really helps

 

Sadly nothing was documented in pxc releases about wsrep version, and variables still shows 26.4.3

 

| version_comment          | Percona XtraDB Cluster binary (GPL) 8.0.30, Revision aff6a8b, WSREP version 26.4.3 |

I found out I should have checked for GALERA_VERSIONS file in submodules

 

Kamil Holubicki March 3, 2023 at 4:57 PM

Minimal testcase assuming default Galera communication port 4567:

  1. start single node cluster

  2. while true; do nmap -p4567 127.0.0.1; done

It crashes immediately.

The issue was fixed by Galera upstream commit 930c016108d7086b472ad7a8b9d0f6989202b48a and is included in Galera 26.4.12, so:

8.0.27 -> galera 26.4.10 - failure
8.0.28 -> galera 26.4.11 - failure
8.0.29 -> galera 26.4.12 - works fine

Flag notifications