replication broken when master killed on older PG

General

Escalation

General

Escalation

Description

self-healing test is failing on PG 14 and lower in step “13-read-from-all-pods” where one pod is missing data after in previous step network loss was introduced to the master pod.

This is happening on PG 14 and lower.

Here’s how it looks:

logger.go:42: 16:02:47 | self-healing/13-read-from-all-pods | test step failed 13-read-from-all-pods
case.go:364: failed in step 13-read-from-all-pods
case.go:366: --- ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3
    +++ ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3
    @@ -4,9 +4,18 @@
          100500
          100501
          100502
    -     100503
     kind: ConfigMap
     metadata:
    +  managedFields:
    +  - apiVersion: v1
    +    fieldsType: FieldsV1
    +    fieldsV1:
    +      f:data:
    +        .: {}
    +        f:data: {}
    +    manager: kubectl-create
    +    operation: Update
    +    time: "2023-12-19T15:02:16Z"
       name: 13-read-from-3
       namespace: kuttl-test-healthy-barnacle
     
    
case.go:366: resource ConfigMap:kuttl-test-healthy-barnacle/13-read-from-3: .data.data: value mismatch, expected:  100500
     100501
     100502
     100503 != actual:  100500
     100501
     100502

And indeed if we check manually data is still missing:

$ kubectl -n kuttl-test-healthy-barnacle exec pg-client-6cc584874-42gpr -- bash -c 'printf '\''\c myapp \\\ SELECT * from myApp;\n'\'' | psql -v ON_ERROR_STOP=1 -t -q postgres://'\''postgres:8KvTH9RTF6CrCL35yiy0qexO@self-healing-instance1-xkb2-0.self-healing-pods.kuttl-test-healthy-barnacle.svc'\'''
 100500
 100501
 100502

patronictl is showing following status:

bash-4.4$ patronictl list
+ Cluster: self-healing-ha (7314320279136440409) ---------------------------------+---------+---------+----+-----------+
| Member                        | Host                                            | Role    | State   | TL | Lag in MB |
+-------------------------------+-------------------------------------------------+---------+---------+----+-----------+
| self-healing-instance1-l48l-0 | self-healing-instance1-l48l-0.self-healing-pods | Leader  | running |  3 |           |
| self-healing-instance1-ps7w-0 | self-healing-instance1-ps7w-0.self-healing-pods | Replica | running |  3 |         0 |
| self-healing-instance1-xkb2-0 | self-healing-instance1-xkb2-0.self-healing-pods | Replica | running |  2 |        32 |
+-------------------------------+-------------------------------------------------+---------+---------+----+-----------+

So there is a lag for affected pod and also timeline is lower.

Environment

None

Activity

Done

Details
Assignee
Tomislav Plavcic
Reporter
Tomislav Plavcic
Found by Automation
Yes
Needs QA
Yes
Fix versions
2.4.0
1.6.0
Affects versions
2.2.0
2.3.0
1.5.1
Priority
High

Smart Checklist

Created December 19, 2023 at 3:14 PM

Updated May 23, 2024 at 4:44 PM

Resolved May 23, 2024 at 4:38 PM

replication broken when master killed on older PG

Description

Environment

Activity

Details
Assignee
Tomislav Plavcic
Reporter
Tomislav Plavcic
Found by Automation
Yes
Needs QA
Yes
Fix versions
2.4.0
1.6.0
Affects versions
2.2.0
2.3.0
1.5.1
Priority
High

Details

Assignee

Reporter

Found by Automation

Needs QA

Fix versions

Affects versions

Priority

Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

replication broken when master killed on older PG

Description

Environment

Activity

DetailsAssigneeTomislav PlavcicTomislav PlavcicReporterTomislav PlavcicTomislav PlavcicFound by AutomationYesNeeds QAYesFix versions2.4.01.6.0Affects versions2.2.02.3.01.5.1PriorityHigh

Details

Assignee

Reporter

Found by Automation

Needs QA

Fix versions

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

Details
Assignee
Tomislav Plavcic
Reporter
Tomislav Plavcic
Found by Automation
Yes
Needs QA
Yes
Fix versions
2.4.0
1.6.0
Affects versions
2.2.0
2.3.0
1.5.1
Priority
High

Smart Checklist