pod-0 on replica does not automatically reconnect to source after I re-create it

General

Escalation

General

Escalation

Description

After I kill pod-0 on replica it does not automatically reconnects to source.

This bug is about about async replication between clusters

https://www.percona.com/doc/kubernetes-operator-for-pxc/replication.html

I had working setup where rcluster2-pxc-0 was connected to source.

To check what happens I brutally killed this pod

kubectl delete rcluster2-pxc-0 --force

My initial expectation was that when this pod becomes unavailable then

pod-1 or pod-2 will connect to source. Unfortunately this did not happen,

probably it was not designed to do that, but if this is the case - this is oversight.

The pod-0 may be non-available for long period of time, and I believe we still should continue to receive replication events from source.

Now my second expectation was that After pod-0 restarts it will automatically reconnects to source.

This also did not happen.

Replication on this pod was stopped.

This is status I see

mysql> show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: 
                  Source_Host: 34.72.164.20
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000009
          Read_Source_Log_Pos: 196
               Relay_Log_File: rcluster2-pxc-0-relay-bin-pxc2_to_pxc1.000023
                Relay_Log_Pos: 68208992
        Relay_Source_Log_File: binlog.000009
           Replica_IO_Running: No
          Replica_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 68208783
              Relay_Log_Space: 68209504
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Source_Server_Id: 0
                  Source_UUID: 26e9bc23-fb73-11eb-8462-b6d5ee5f135d
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: 
           Source_Retry_Count: 3
                  Source_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Source_SSL_Crl: 
           Source_SSL_Crlpath: 
           Retrieved_Gtid_Set: 2942f3e5-fb73-11eb-89bd-46bdefcf529a:1-15374
            Executed_Gtid_Set: 2942f3e5-fb73-11eb-89bd-46bdefcf529a:1-15374,
68b9e9f0-fb74-11eb-88f8-324aacb070b9:1-313,
6a95f35e-fb74-11eb-b5eb-c72eb0f67fba:1-9
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: pxc2_to_pxc1
           Source_TLS_Version: 
       Source_public_key_path: 
        Get_Source_public_key: 0
            Network_Namespace: 
1 row in set (0.00 sec)

Environment

None

Smart Checklist

Activity

Show:

Sergey Pronin August 23, 2021 at 10:40 AM

We need to identify the defaults and allow users to tune parameters.

Vadim Tkachenko August 13, 2021 at 5:57 PM

so I tested it further and it seems the first part also works, it just take longer to switch replica to another pod.

Vadim Tkachenko August 12, 2021 at 7:17 PM

Actually while I was filing this bug, the pod-0 reconnected to source, it just took a while to do so.

But the first part of this report is still valid - I expected pod-1 or pod-2 to take over. Or it is also takes prolonged period of time to reconnect?

Cannot Reproduce

Details
Assignee
Unassigned
Reporter
Vadim Tkachenko
Affects versions
1.9.0
Priority
Medium

Smart Checklist

Created August 12, 2021 at 7:04 PM

Updated March 5, 2024 at 5:46 PM

Resolved August 23, 2021 at 10:40 AM

pod-0 on replica does not automatically reconnect to source after I re-create it

Description

Environment

Smart Checklist

Activity

Sergey Pronin August 23, 2021 at 10:40 AM

Vadim Tkachenko August 13, 2021 at 5:57 PM

Vadim Tkachenko August 12, 2021 at 7:17 PM

Details
Assignee
Unassigned
Reporter
Vadim Tkachenko
Affects versions
1.9.0
Priority
Medium

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

pod-0 on replica does not automatically reconnect to source after I re-create it

Description

Environment

Smart Checklist

Activity

Sergey Pronin August 23, 2021 at 10:40 AM

Vadim Tkachenko August 13, 2021 at 5:57 PM

Vadim Tkachenko August 12, 2021 at 7:17 PM

DetailsAssigneeUnassignedUnassignedReporterVadim TkachenkoVadim TkachenkoAffects versions1.9.0PriorityMedium

Details

Assignee

Reporter

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong

Details
Assignee
Unassigned
Reporter
Vadim Tkachenko
Affects versions
1.9.0
Priority
Medium

Smart Checklist