Issues

Select view

Select search mode

 
50 of

Add Option to Limit SST Retry Attempts

Description

Hi,

When SST fails several times in a row due to a software bug, for example, it can hardly be fixed by itself without intervention.

Meanwhile, SST runs continuously, consuming resources such as network bandwidth and affecting the donor's performance.

The Kubelet's --backoff-max-restart-delay is set to 5 minutes by default, which can be too short between restart attempts. This can lead to repeated SST retries in a short timeframe, potentially impacting cluster performance.

Ideally, nodes should have a configurable limit on SST attempts. After a few failed retries, they could stop requesting SST to avoid putting additional pressure on the cluster.

Thanks!

Environment

None

Details

Assignee

Reporter

Needs QA

Yes

Affects versions

Priority

Smart Checklist

Created 1 hour ago
Updated 1 hour ago

Activity