Make status output consistent

Description

User story
As a PMM user, I want to see the status information when pmm-agent can't connect to the pmm-server instead of error message And additionally to see the connection up time value between agent and server during specified time window (by default it will be 24 hours)

Current behavior
When user runs command pmm-admin status --json when agent can't connect to server response look like this:

Acceptance criteria:

  • when pmm-agent isn't connected to the pmm-server and when command pmm-admin status --json is executed the response should be in JSON format and contain information about connected up time. For example:

where 'connection_uptime' field means percentage of how much time agent has connection to server during predefined time window (by default it will be 1 hour).

  • In case when agent can't connect to the server `node_id`, `server_clock_drift`, `server_latency` and list of available agents will be empty, because agent can't get info from the server.

  • In case when connection between agent and server is established response from pmm-admin status --json will have `up_connected_time` field as well. For example:

 

 

Algorithm for implementation calculation of connection uptime:

We will store in memory set of events when connection status was changed, like this

For example:

 

Then we can calculate connected time as interval between connected and disconnected events
Here is example how it works.
When we have such set of events in connection set `f1 s1 f2`

where f1 - first event of failed connection
s1 - first event of successful connection
f2 - second event of failed connection

we can calculate  result using next formula 

 

where

  • time_between(s1, f2) - connection up time

  • time_between(f1, now) - total time between first event (f1) and current moment

where time_between(s1, f2) - connection up time
time_between(f1, now) - total time between first event (f1) and current moment

How to test

None

How to document

None

Activity

Show:

Yaroslav Podorvanov 
July 27, 2022 at 8:20 PM
(edited)

Test script with up and down:

Yaroslav Podorvanov 
July 27, 2022 at 8:07 PM

Testing build:

Test panic fixed:

panic fixed

Yaroslav Podorvanov 
July 27, 2022 at 8:37 AM
(edited)

everything is already discussed with , all fine!
Panic possible when
1. pmm-server down
2. then start pmm-agent
3. on try pmm-admin status then pmm-agent panic

C W 
July 27, 2022 at 8:33 AM

you haven't provided any information about how the panic was triggered, so what did you do ahead of seeing the panic message?
I reported the panic to , but I only saw it when the agent wasn't setup. The following is the test script that I ran:

The unit file for the agent:

The config that the agent used:

Yaroslav Podorvanov 
July 26, 2022 at 10:24 PM

panic

Done

Details

Assignee

Reporter

Priority

Components

Needs QA

Planned Version/s

Fix versions

Story Points

Affects versions

Created November 15, 2019 at 4:37 PM
Updated March 6, 2024 at 5:12 AM
Resolved August 29, 2022 at 12:17 PM