Issues
- Request for adjusting the postgres_exporter code for adding another metricPMM-13697
- Database Topology Visualization for PMM ServerPMM-13640
- RDS API changes the data sent in the requestPMM-13157Alex Demidoff
- Override Summary and Description when alert is createdPMM-13140
- pmm-agent output has changed from stdout to stderrPMM-12997Resolved issue: PMM-12997Nurlan Moldomurov
- Advisors require regular reviewPMM-12589Resolved issue: PMM-12589parag.bhayani
- External services show with an incorrect status in the PMM inventoryPMM-12478Alex Demidoff
- Unable to add external service with a query string in the metrics-pathPMM-12473Resolved issue: PMM-12473Alex Demidoff
- No API to Delete Alerting RulesPMM-12461
- Missing data points with Textfile CollectorPMM-12402Resolved issue: PMM-12402
- Registering node with Grafana Admin flag enabled and non-admin role doesn't workPMM-11714Resolved issue: PMM-11714Nurlan Moldomurov
- Create/update a database cluster from a DBaaS template via PMMPMM-11550Resolved issue: PMM-11550Diogo Recharte
- [DBaaS][UI] allow expansion of disk resourcesPMM-11516Resolved issue: PMM-11516
- [DBaaS][UI] at rest encryptionPMM-11468Resolved issue: PMM-11468
- [DBaaS][UI] deletion of unbound PVCsPMM-11442Resolved issue: PMM-11442
- Allow a different IP for k8s to PMMPMM-11440Resolved issue: PMM-11440peter.szczepaniak
- PMM DBaaS edit db cluster brokenPMM-11428Resolved issue: PMM-11428Iaroslavna Soloveva
- Configuration templates for database and k8s settingsPMM-11420Resolved issue: PMM-11420Diogo Recharte
- Change from kingpin in the client package breaks automationPMM-10909Resolved issue: PMM-10909
- Ability to monitor for "Host Down"PMM-9544Resolved issue: PMM-9544Michael Okoko
- Backup EncryptionPMM-8742
- Provide an ability to disable query examples for mongoDB monitoringPMM-7860
- [DBaaS][UI] automatic updates schedulingEVEREST-19
Request for adjusting the postgres_exporter code for adding another metric
Description
How to test
How to document
Activity
Bhargav Kamineni last week
@David Gonzalez CLS was done from my side a month back, I think Nurlan needs to sign the CLA, @Nurlan Moldomurov Please confirm if Iām correct or let me know if something is needed from my side. Thank you
https://github.com/percona/postgres_exporter/pull/293#issuecomment-2641851377
David Gonzalez March 5, 2025 at 7:33 PM
Hi @Bhargav Kamineni , thank you for this request!
I was checking the latest status of the PR and it seems it is on hold because it is missing a CLA signature, please see: https://github.com/percona/postgres_exporter/pull/293#issuecomment-2648041689
Can you take a look?
Bhargav Kamineni February 10, 2025 at 10:45 AM
Thanks @Nurlan Moldomurov , Can you provide an estimated ETA for reviewing or completing this request? This is just to update the client.
Nurlan Moldomurov February 5, 2025 at 8:01 AM
Hi @Bhargav Kamineni , thanks for the PR, but that one was created for upstream. I created another one for our fork https://github.com/percona/postgres_exporter/pull/293/files
Bhargav Kamineni February 5, 2025 at 3:39 AM
@Nurlan Moldomurov Thanks for granting the access , I have created a pull request
https://github.com/prometheus-community/postgres_exporter/pull/1119
Please check and let me know of any comments, thank you!
Hi Team,
Problem Statement:
We have encountered an issue with the
pg_stat_activity_max_tx_duration
alert, specifically for long-running queries and idle-in-transaction states. This issue affects our ability to accurately identify the root cause of long-running transactions and queries, as the current alerting mechanism leads to confusion regarding the actual duration of the transaction versus the query.Current Situation:
In the current implementation, when we receive an alert for long-running transactions, the alert is often triggered based on the entire transaction duration, which includes multiple queries. This results in false perceptions about the query durations. For example, consider the following transaction:
BEGIN; (09:00) STMT1; (09:01) STMT2; (09:03) STMT3; (09:03) STMT4; (09:04) STMT5; (09:07) STMT6; (09:12) STMT7; (09:35) STMT8; (09:35) STMT9; (09:35) STMT10; (09:36) STMT11; (09:39) COMMIT; (09:40)
When we receive an alert, we might believe that
STMT6 has been running for 35 minutes, as the transaction started at 09:00. The alert was triggered when COMMIT was executed at 09:40. In reality, STMT6 started at 09:12 and completed at 09:35, so its duration was only 23 minutes. The alert mechanism checks xact_start (transaction start) rather than query_start
(query start), leading to incorrect conclusions.This issue arises from the following query in the
postgres_exporter
code:SELECT pg_database.datname, tmp.state, tmp2.usename, tmp2.application_name, COALESCE(count, 0) AS count, COALESCE(max_tx_duration, 0) AS max_tx_duration FROM ( VALUES ('active'), ('idle'), ('idle in transaction'), ('idle in transaction (aborted)'), ('fastpath function call'), ('disabled') ) AS tmp(state) CROSS JOIN pg_database LEFT JOIN ( SELECT datname, state, usename, application_name, count(*) AS count, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float AS max_tx_duration FROM pg_stat_activity GROUP BY datname, state, usename, application_name ) AS tmp2 ON tmp.state = tmp2.state AND pg_database.datname = tmp2.datname
This code calculates the maximum transaction duration (
max_tx_duration
) based onxact_start
, which includes all queries within a transaction, rather than just the query that is running for the longest period.Feature Request:
To address this issue, we propose the following enhancements to the
postgres_exporter
and its metrics:Add a new metric: We should introduce a new metric for tracking the maximum state duration (
max_state_duration
), which would be based on thestate_change
timestamp rather thanxact_start
. This will help in identifying long-running queries accurately, without being influenced by the overall transaction duration.Update the query: Modify the query to calculate the
max_state_duration
, which will represent the longest period that a transaction has been in a specific state (such asidle in transaction
) since the state changed.The updated query would look like this:
SELECT pg_database.datname, tmp.state, tmp2.usename, tmp2.application_name, COALESCE(count, 0) AS count, COALESCE(max_tx_duration, 0) AS max_tx_duration, COALESCE(max_state_duration, 0) AS max_state_duration FROM (VALUES ('active'), ('idle'), ('idle in transaction'), ('idle in transaction (aborted)'), ('fastpath function call'), ('disabled')) AS tmp(state) CROSS JOIN pg_database LEFT JOIN ( SELECT datname, state, usename, application_name, count(*) AS count, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float AS max_tx_duration, MAX(EXTRACT(EPOCH FROM now() - state_change))::float AS max_state_duration FROM pg_stat_activity GROUP BY datname, state, usename, application_name ) AS tmp2 ON tmp.state = tmp2.state AND pg_database.datname = tmp2.datname
Update the exporter metric definition: The new metric (
max_state_duration
) would need to be added to thepostgres_exporter
metric definition as follows:"pg_stat_activity": { map[string]ColumnMapping{ "datname": {LABEL, "Name of this database", nil, nil}, "state": {LABEL, "connection state", nil, semver.MustParseRange(">=9.2.0")}, "usename": {LABEL, "Name of the user logged into this backend", nil, nil}, "application_name": {LABEL, "Name of the application that is connected to this backend", nil, nil}, "count": {GAUGE, "number of connections in this state", nil, nil}, "max_tx_duration": {GAUGE, "max duration in seconds any active transaction has been running", nil, nil}, "max_state_duration": {GAUGE, "max state change duration in seconds any active transaction has been", nil, nil}, }, true, 0, },
Modify the alerting rules: With the introduction of
max_state_duration
, we can modify our existing alerts to be more accurate and based on the new metric:Idle in Transaction Alert: Trigger when a transaction has been in the "idle in transaction" state for more than 5 minutes without state change:
alert: PostgresqlIdleInTransaction expr: pg_stat_activity_max_state_duration{state="idle in transaction"} > 300
Long-Running Transactions Alert: Keep the current alert based on transaction duration (
max_tx_duration
):- alert: PostgresqlTransactionDuration expr: pg_stat_activity_max_tx_duration{application_name=~'.*', usename!=''} > 420
Query Duration Alert: Trigger a new alert for long-running queries based on the time spent in the "active" state:
- alert: PostgresqlQueryDuration expr: pg_stat_activity_max_state_duration{state="active"} > 360