replication broken when master killed on older PG

Description

self-healing test is failing on PG 14 and lower in step “13-read-from-all-pods” where one pod is missing data after in previous step network loss was introduced to the master pod.

This is happening on PG 14 and lower.

Here’s how it looks:

logger.go:42: 16:02:47 | self-healing/13-read-from-all-pods | test step failed 13-read-from-all-pods case.go:364: failed in step 13-read-from-all-pods case.go:366: --- ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3 +++ ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3 @@ -4,9 +4,18 @@ 100500 100501 100502 - 100503 kind: ConfigMap metadata: + managedFields: + - apiVersion: v1 + fieldsType: FieldsV1 + fieldsV1: + f:data: + .: {} + f:data: {} + manager: kubectl-create + operation: Update + time: "2023-12-19T15:02:16Z" name: 13-read-from-3 namespace: kuttl-test-healthy-barnacle case.go:366: resource ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3: .data.data: value mismatch, expected: 100500 100501 100502 100503 != actual: 100500 100501 100502

And indeed if we check manually data is still missing:

$ kubectl -n kuttl-test-healthy-barnacle exec pg-client-6cc584874-42gpr -- bash -c 'printf '\''\c myapp \\\ SELECT * from myApp;\n'\'' | psql -v ON_ERROR_STOP=1 -t -q postgres://'\''postgres:8KvTH9RTF6CrCL35yiy0qexO@self-healing-instance1-xkb2-0.self-healing-pods.kuttl-test-healthy-barnacle.svc'\''' 100500 100501 100502

patronictl is showing following status:

bash-4.4$ patronictl list + Cluster: self-healing-ha (7314320279136440409) ---------------------------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-------------------------------+-------------------------------------------------+---------+---------+----+-----------+ | self-healing-instance1-l48l-0 | self-healing-instance1-l48l-0.self-healing-pods | Leader | running | 3 | | | self-healing-instance1-ps7w-0 | self-healing-instance1-ps7w-0.self-healing-pods | Replica | running | 3 | 0 | | self-healing-instance1-xkb2-0 | self-healing-instance1-xkb2-0.self-healing-pods | Replica | running | 2 | 32 | +-------------------------------+-------------------------------------------------+---------+---------+----+-----------+

So there is a lag for affected pod and also timeline is lower.

Environment

None

Activity

Done

Details

Assignee

Reporter

Found by Automation

Yes

Needs QA

Yes

Fix versions

Affects versions

Priority

Smart Checklist

Created December 19, 2023 at 3:14 PM
Updated May 23, 2024 at 4:44 PM
Resolved May 23, 2024 at 4:38 PM

Flag notifications