pod-0 on replica does not automatically reconnect to source after I re-create it

Description

After I kill pod-0 on replica it does not automatically reconnects to source.

This bug is about about async replication between clusters

https://www.percona.com/doc/kubernetes-operator-for-pxc/replication.html

 

I had working setup where rcluster2-pxc-0 was connected to source.

To check what happens I brutally killed this pod

as 

 

kubectl delete rcluster2-pxc-0  --force

 

 

My initial expectation was that when this pod becomes unavailable then 

pod-1 or pod-2 will connect to source. Unfortunately this did not happen,

probably it was not designed to do that, but if this is the case - this is oversight.

The pod-0 may be non-available for long period of time, and I believe we still should continue to receive replication events from source.

Now my second expectation was that After pod-0 restarts it will automatically reconnects to source.

This also did not happen.

Replication on this pod was stopped.

This is status I see

mysql> show replica status\G *************************** 1. row *************************** Replica_IO_State: Source_Host: 34.72.164.20 Source_User: replication Source_Port: 3306 Connect_Retry: 60 Source_Log_File: binlog.000009 Read_Source_Log_Pos: 196 Relay_Log_File: rcluster2-pxc-0-relay-bin-pxc2_to_pxc1.000023 Relay_Log_Pos: 68208992 Relay_Source_Log_File: binlog.000009 Replica_IO_Running: No Replica_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 68208783 Relay_Log_Space: 68209504 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: NULL Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 0 Source_UUID: 26e9bc23-fb73-11eb-8462-b6d5ee5f135d Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Source_Retry_Count: 3 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: 2942f3e5-fb73-11eb-89bd-46bdefcf529a:1-15374 Executed_Gtid_Set: 2942f3e5-fb73-11eb-89bd-46bdefcf529a:1-15374, 68b9e9f0-fb74-11eb-88f8-324aacb070b9:1-313, 6a95f35e-fb74-11eb-b5eb-c72eb0f67fba:1-9 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: pxc2_to_pxc1 Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace: 1 row in set (0.00 sec)

 

 

Environment

None

Smart Checklist

Activity

Show:

Sergey Pronin August 23, 2021 at 10:40 AM

We need to identify the defaults and allow users to tune parameters.

Vadim Tkachenko August 13, 2021 at 5:57 PM

so I tested it further and it seems the first part also works, it just take longer to switch replica to another pod.

Vadim Tkachenko August 12, 2021 at 7:17 PM

Actually while I was filing this bug, the pod-0 reconnected to source, it just took a while to do so.

But the first part of this report is still valid  - I expected pod-1 or pod-2 to take over. Or it is also takes prolonged period of time to reconnect?

Cannot Reproduce

Details

Assignee

Reporter

Affects versions

Priority

Smart Checklist

Created August 12, 2021 at 7:04 PM
Updated March 5, 2024 at 5:46 PM
Resolved August 23, 2021 at 10:40 AM

Flag notifications