pmm_managed_inventory_agents is missing important information
Description
How to test
1/ deploy a fresh PMM instance
2/ add a mysql (can also be postgreSQL or mongoDB) service to monitoring
3/ create an alert rule from the template called "PMM agent down"; feel free to change the value of the field "Duration" from 60s to a lower value, say 10s
4/ stop the pmm-agent that monitors the service added at step 2 (usually by running `systemctl stop pmm-agent`)
5/ wait for 60 seconds (or whatever Duration value you have defined at step 3)
6/ go to the Alerts page in PMM UI (https://<pmm-server>/graph/alerting/alerts)
7/ you should see an active (firing) Alert notification "PMM agent down"
8/ check and confirm that you can see the node name both in the alert description and the summary
How to document
We have added the `node_name` property to "PMM agent down" alert template that ships with PMM. This will make it more convenient for the user to refer to the node where the failure occurred.
Attachments
relates to
Activity

Naresh December 11, 2023 at 3:02 AM
Sure, Thanks

Alex Demidoff December 7, 2023 at 11:50 AM
FYI: I believe we are now providing the node name in the labels object, so the following syntax could be even more useful:

Naresh December 6, 2023 at 1:38 PM
Hi
Thanks for the details, I will try to create the templates and update you.

Roma Novikov December 6, 2023 at 10:58 AM
Hi !
What you asking is out of the scope of this task.
if you want more alerts templates for all agent's problems you can use metric `pmm_managed_inventory_agents` and based on the label `agent_type` create more alert templates [https://docs.percona.com/percona-monitoring-and-management/get-started/alerting.html#template-example ]
The current template is :
---
templates:
- name: pmm_agent_down
version: 1
summary: PMM agent down
expr: 'pmm_managed_inventory_agents{agent_type="pmm-agent"} == bool 0 '
for: 1m
severity: critical
annotations:
description: |-
{{ PMM agent on {{ $labels.node_id }} cannot be reached. Host may be down.}}
{{ summary: PMM agent is down ({{ $labels.node_id }})}}
so you can use it to create more.

Naresh December 5, 2023 at 3:31 PM
Can you please update on the above?
Details
Assignee
Alex DemidoffAlex DemidoffReporter
C WC WPriority
MediumComponents
Needs QA
YesNeeds Doc
NoPlanned Version/s
Fix versions
Story Points
1Affects versions
Details
Details
Assignee

Reporter

Priority
Components
Needs QA
Needs Doc
Planned Version/s
Fix versions
Story Points
Affects versions
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

User story:
As a DBA/SRE on pager needing to respond to a flurry of PMM Agent Down alerts, I am forced to spend time performing cross-referencing before taking further action.
UI/UX:
TBD
Acceptance criteria
Out of scope:
TBD
Suggested implementation:
Provide the
node_name
label in the metricUse the
node_name
label in the templated messagingHow to test:
TBD
Details:
The
pmm_managed_inventory_agents
metric contains IDs, but no human-friendly representations of the instance that has lost connectivity/been stopped. In addition, the template uses the unfriendlynode_id
. This is compounded by the fact that theinstance
value ispmm-server
.Templates for services seem to use
service_name
in the alert messaging and nodes usenode_name
Here are example annotations from the current implementation: