Operator doesn't realize there is a failed replica.

Description

If there is a failed replica, the operator won't notice and it will still mark it as "ready".

How to reproduce (1.3.0). From a working environment, try the following steps.

 

1- Manually remove the pg_wal directory from a replica

2- Connect to the primary and create a table

3- Check error log in replica:

4- Check pods outputs:

The expected outcome is to get node cluster1-repl1-6b9677bcf7-9rnnk as 0/1 in the READY column, not 1/1.

The replica is indeed not getting events:

Whereas the other replica is (check the t1 table listed):

This was not happening in v1.1.0, at least. I didn't check with 1.2.0, but can do it if necessary.

Environment

None

AFFECTED CS IDs

CS0029389

Smart Checklist

Activity

Show:

Details

Assignee

Reporter

Labels

Affects versions

Priority

Smart Checklist

Created August 15, 2022 at 11:38 PM
Updated July 15, 2024 at 3:37 PM