SST initialization delay

Description

Hi Team,

we need to speed-up SST initialization a lot.
The issue is when I start PXC cluster each new PXC instance requires waits up to 2 minutes before start transferring sst-info.
on some systems, it is needed to tune sst-initial-timeout up to 240 seconds!
delay is on Donor side.

please look logs below.
stage 1. donor and joiner negotiated
DONOR below

JOINER below

now DONOR started to wait for something....

DONOR below

but sst-initial-timeout=120 not enough, and JOINER failed
JOINER below

we need to get this issue solved, we are receiving bad feedback from early adopters of Kubernetes Operator.

Environment

None

Attachments

4

Smart Checklist

Activity

Show:

KennT July 25, 2019 at 4:01 AM

commit 4ac074ffad7315b82092ff08cb590cd7fa6eba27
Author: Kenn Takara <kenn.takara@percona.com>
Date: Tue Jul 23 01:11:55 2019 -0700

: SST initialization delay

Issue
In certain cases (most notably, running the PXC docker images for
kubernetes), the SST logic to find the parent pid would fail (due
to the pid being at the end of the list of pids).

Solution
Change the grep to look for the pids correctly.

Mykola Marzhan July 2, 2019 at 5:28 PM

I believe -w is good option, I like it

Ramesh Sivaraman June 27, 2019 at 9:17 AM

  I was able to reproduce SST delay using kubernetes operator with .7.25 and .7.26 images. Attached .7.26 images test logs

 

Mykola Marzhan June 24, 2019 at 5:43 AM

Kenn, please look carefully into logs.
delay inside galera donor (not backup script)

KennT June 24, 2019 at 12:39 AM

From the logs, it still appears that the wait_for_listen() call is still failing (thus the long wait).

If you can try this with the 5.7.26 build (that has more debugging output from the wsrep_sst_xtrabackup-v2 script).

Done

Details

Assignee

Reporter

Time tracking

1w 7h 4m logged

Priority

Smart Checklist

Created June 3, 2019 at 6:44 AM
Updated March 6, 2024 at 10:09 PM
Resolved July 25, 2019 at 3:08 AM