Done
Details
Assignee
Kamil HolubickiKamil HolubickiReporter
Juan ArrutiJuan ArrutiLabels
Planned Version/s
Needs QA
NoComponents
Sprint
NoneAffects versions
Priority
Medium
Details
Details
Assignee
Kamil Holubicki
Kamil HolubickiReporter
Juan Arruti
Juan ArrutiLabels
Planned Version/s
Needs QA
No
Components
Sprint
None
Affects versions
Priority
Smart Checklist
Smart Checklist
Smart Checklist
Created October 23, 2024 at 12:41 AM
Updated January 14, 2025 at 10:17 AM
Resolved November 8, 2024 at 9:05 AM
If Orchestrator host can't access Source database, and the Replica is lagging behind, it will call runEmergentOperations, case UnreachableMasterWithLaggingReplicas, and finally will fail when calling RestartReplicationQuick with the following error:
2024-10-23 00:09:05 DEBUG analysis: ClusterName: node1:3306, IsMaster: true, LastCheckValid: false, LastCheckPartialSuccess: false, CountReplicas: 1, CountValidReplicas: 1, CountValidReplicatingReplicas: 1, CountLaggingReplicas: 1, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0 2024-10-23 00:09:05 INFO executeCheckAndRecoverFunction: proceeding with UnreachableMasterWithLaggingReplicas detection on node1:3306; isActionable?: false; skipProcesses: false 2024-10-23 00:09:05 INFO checkAndExecuteFailureDetectionProcesses: could not register UnreachableMasterWithLaggingReplicas detection on node1:3306 2024-10-23 00:09:05 INFO executeCheckAndRecoverFunction: proceeding with UnreachableMasterWithLaggingReplicas recovery on node1:3306; isRecoverable?: false; skipProcesses: false 2024-10-23 00:09:05 ERROR ExecNoPrepare(default:3306) : Error 1065 (42000): Query was empty 2024-10-23 00:09:05 ERROR default:3306: RestartReplicationQuick: '""' failed: Error 1065 (42000): Query was empty 2024-10-23 00:09:05 INFO auditType:emergently-restart-replication-topology-instance instance:default:3306 cluster:node1:3306 message:UnreachableMasterWithLaggingReplicas
How to repeat:
Deploy latest Percona Orchestrator
./anydbver update ./anydbver deploy ps:8.0 node1 ps:8.0,master=node0 node2 ps:8.0,master=node1 node3 percona-orchestrator:latest,master=node0
You can set the Orchestrator option ReplicationLagQuery to produce lag artificially:
$ rpm -qa | grep -i orc percona-orchestrator-client-3.2.6-14.el8.x86_64 percona-orchestrator-3.2.6-14.el8.x86_64 percona-orchestrator-cli-3.2.6-14.el8.x86_64 "ReplicationLagQuery": "SELECT /*+ MAX_EXECUTION_TIME(3000) */ slave_lag_seconds FROM test.status",
On Source create the status table:
CREATE DATABASE test; USE test; CREATE TABLE status ( slave_lag_seconds int DEFAULT NULL ); INSERT INTO status values (2000);
On Source, prevent Orchestrator node from connecting the server:
$ yum install iptables-services $ systemctl start iptables $ iptables -I FORWARD -s <orchestrator-host-ip> -j REJECT
I also tested using percona-orchestrator-3.2.6-13.el8.x86_64.rpm package and I don't see this issue.