Nodes are unable to connect to PMM Server deployed on OVA

Description

We have recently got a report from our community user https://forums.percona.com/t/nodes-are-unable-to-connect-server-failed-to-establish-two-way-communication-channel/12437/1 

We need to understand what is causing the connection to fail, attaching the pmm-agent logs provided by the user to the ticket. 

 

Impact on User: Unable to establish a connection between nodes and pmm-server. 

How to test

None

How to document

None

Attachments

2

Smart Checklist

Activity

Show:

Artem Meshcheryakov January 11, 2022 at 2:27 AM

Hi guys,

Not opening a new request, since all symptoms match exactly this case.

We are also on PMM 2.22 currently. Suddenly. after PMM2 server restart. PMM Agents cannot connect to it with the symtoms:

What I see in PMM2 logs:

pmm-managed.log

nginx.log

So it looks good. Direct connections are also OK:

PMM Agent service restart does not help.
Also this operation is successful:

But the error cannot be resolved.

It is magically fixed with another PMM2 restart.

What is specific to our environment: PMM2 server is running inside GKE infra, installed via Helm chart from "https://percona-charts.storage.googleapis.com", version 2.21.0.
The issue is triggered by GKE node autoupgrades, so the container is restarted automatically, breaking the connectivity. But restarting it again helps.

I totally think this is somehow specific to environement, but I lack troubleshooting details to understand what exactly is causing it.

If you can take a look at the outputs and point to some specific things to check, I would highly appreciate it.

Lalit Choudhary October 7, 2021 at 8:38 AM

Hi

from the attached pmm agent log I can see that the agent restarted and other if connection error to pmm-server but the connection successfully establish after some time and we can also see that was running fine for while.

 

 

Looking at the errors,  it looks like a network connection issue between nodes and pmm-server as it starts working after sometime making a successful connection.

Let me know if above is not the case.

 

 

 

Details

Assignee

Reporter

Priority

Components

Needs QA

Needs Doc

Affects versions

Smart Checklist

Created October 6, 2021 at 10:27 AM
Updated February 13, 2024 at 3:51 PM