Orchestrator RestartReplicationQuick fails with Error 1065 (query was empty)

Description

If Orchestrator host can't access Source database, and the Replica is lagging behind, it will call runEmergentOperations, case UnreachableMasterWithLaggingReplicas, and finally will fail when calling RestartReplicationQuick with the following error:

2024-10-23 00:09:05 DEBUG analysis: ClusterName: node1:3306, IsMaster: true, LastCheckValid: false, LastCheckPartialSuccess: false, CountReplicas: 1, CountValidReplicas: 1, CountValidReplicatingReplicas: 1, CountLaggingReplicas: 1, CountDelayedReplicas: 0, CountReplicasFailingToConnectToMaster: 0 2024-10-23 00:09:05 INFO executeCheckAndRecoverFunction: proceeding with UnreachableMasterWithLaggingReplicas detection on node1:3306; isActionable?: false; skipProcesses: false 2024-10-23 00:09:05 INFO checkAndExecuteFailureDetectionProcesses: could not register UnreachableMasterWithLaggingReplicas detection on node1:3306 2024-10-23 00:09:05 INFO executeCheckAndRecoverFunction: proceeding with UnreachableMasterWithLaggingReplicas recovery on node1:3306; isRecoverable?: false; skipProcesses: false 2024-10-23 00:09:05 ERROR ExecNoPrepare(default:3306) : Error 1065 (42000): Query was empty 2024-10-23 00:09:05 ERROR default:3306: RestartReplicationQuick: '""' failed: Error 1065 (42000): Query was empty 2024-10-23 00:09:05 INFO auditType:emergently-restart-replication-topology-instance instance:default:3306 cluster:node1:3306 message:UnreachableMasterWithLaggingReplicas

How to repeat:

  1. Deploy latest Percona Orchestrator

./anydbver update ./anydbver deploy ps:8.0 node1 ps:8.0,master=node0 node2 ps:8.0,master=node1 node3 percona-orchestrator:latest,master=node0
  1. You can set the Orchestrator option ReplicationLagQuery to produce lag artificially:

$ rpm -qa | grep -i orc percona-orchestrator-client-3.2.6-14.el8.x86_64 percona-orchestrator-3.2.6-14.el8.x86_64 percona-orchestrator-cli-3.2.6-14.el8.x86_64 "ReplicationLagQuery": "SELECT /*+ MAX_EXECUTION_TIME(3000) */ slave_lag_seconds FROM test.status",
  1. On Source create the status table:

CREATE DATABASE test; USE test; CREATE TABLE status ( slave_lag_seconds int DEFAULT NULL ); INSERT INTO status values (2000);
  1. On Source, prevent Orchestrator node from connecting the server:

$ yum install iptables-services $ systemctl start iptables $ iptables -I FORWARD -s <orchestrator-host-ip> -j REJECT

I also tested using percona-orchestrator-3.2.6-13.el8.x86_64.rpm package and I don't see this issue.

Environment

None

AFFECTED CS IDs

CS0050200

Activity

Show:

Kamil Holubicki November 8, 2024 at 9:05 AM

Will be fixed by DISTMYSQL-466

Done

Details

Assignee

Reporter

Planned Version/s

Needs QA

No

Components

Sprint

Affects versions

Priority

Smart Checklist

Created October 23, 2024 at 12:41 AM
Updated January 14, 2025 at 10:17 AM
Resolved November 8, 2024 at 9:05 AM

Flag notifications