Implement --timeout on xbcloud

Description

xbcloud gets stuck while uploading the backup with 1M tables

To repeat the issue:

 

 

 

 

Environment

None

Activity

Show:

Marcelo Altmann June 28, 2023 at 4:06 PM

TL;DR

Xtrabackup, more specifically xbcloud is waiting on data from object storage (Minio). There seems to be a bug in Minio that it receives a GET operation but never replies back.

This unveiled that xbcloud has no timeout when doing curl calls. This task will be triggered to implement it.

 

Long Description:

Xbcloud is stuck at curl_multi_poll which essentially is waiting for data on the TCP connection:

By checking strace, we can confirm that curl_multi_poll is waiting on epoll event on the socket fd and its timing out:

When looking into tcpdump, we can see that xbcloud successfully sent the GET request to minio server at 2023-06-28 11:33:01.574325 and the server ACK that package (seq n 239029 at 2023-06-28 11:33:01.615522), after that, every 15 seconds there were exchange of TCP KEEP ALIVE packages until xbcloud times out (this version of xbcloud has curl_easy_setopt(curl, CURLOPT_TIMEOUT, 60L); set when creating a curl connection):

 

Immediately after the connection was closed, Minio replied back:

So this is an issue on Minio, rather than xbcloud, however, we should have timeout in place to avoid the backup to get stuck. After timeout we will eventually get an error message:

Done

Details

Assignee

Reporter

Needs Review

Yes

Fix versions

Priority

Smart Checklist

Created June 27, 2023 at 2:05 PM
Updated March 6, 2024 at 6:08 PM
Resolved July 26, 2023 at 9:27 AM