Issues
- Request for adjusting the postgres_exporter code for adding another metricPMM-13697
- move exporters to the github actionsPMM-7602Resolved issue: PMM-7602Andrii Skomorokhov
- Review SHOW VARIABLES "conversion" for Text VariablesPMM-4593
- Node Exporter Update in PMMPMM-4588
- Test with MongoDB 4.2PMM-4221
- Remove global variables and improve test coverage to xx%PMM-4167
- mongodb_mongod_op_latencies_histogram type gaugePMM-3535
- Improve "build" section on exporters documentation pagesPMM-3402
- Have up-to-date releases in github pages for exportersPMM-3398Roma Novikov
- mongodb_exporter: add builds to Github releasesPMM-3356Resolved issue: PMM-3356
- listen on a unix sockPMM-2632
Request for adjusting the postgres_exporter code for adding another metric
Description
How to test
How to document
Activity
Bhargav Kamineni last week
@David Gonzalez CLS was done from my side a month back, I think Nurlan needs to sign the CLA, @Nurlan Moldomurov Please confirm if Iām correct or let me know if something is needed from my side. Thank you
https://github.com/percona/postgres_exporter/pull/293#issuecomment-2641851377
David Gonzalez March 5, 2025 at 7:33 PM
Hi @Bhargav Kamineni , thank you for this request!
I was checking the latest status of the PR and it seems it is on hold because it is missing a CLA signature, please see: https://github.com/percona/postgres_exporter/pull/293#issuecomment-2648041689
Can you take a look?
Bhargav Kamineni February 10, 2025 at 10:45 AM
Thanks @Nurlan Moldomurov , Can you provide an estimated ETA for reviewing or completing this request? This is just to update the client.
Nurlan Moldomurov February 5, 2025 at 8:01 AM
Hi @Bhargav Kamineni , thanks for the PR, but that one was created for upstream. I created another one for our fork https://github.com/percona/postgres_exporter/pull/293/files
Bhargav Kamineni February 5, 2025 at 3:39 AM
@Nurlan Moldomurov Thanks for granting the access , I have created a pull request
https://github.com/prometheus-community/postgres_exporter/pull/1119
Please check and let me know of any comments, thank you!
Hi Team,
Problem Statement:
We have encountered an issue with the
pg_stat_activity_max_tx_duration
alert, specifically for long-running queries and idle-in-transaction states. This issue affects our ability to accurately identify the root cause of long-running transactions and queries, as the current alerting mechanism leads to confusion regarding the actual duration of the transaction versus the query.Current Situation:
In the current implementation, when we receive an alert for long-running transactions, the alert is often triggered based on the entire transaction duration, which includes multiple queries. This results in false perceptions about the query durations. For example, consider the following transaction:
BEGIN; (09:00) STMT1; (09:01) STMT2; (09:03) STMT3; (09:03) STMT4; (09:04) STMT5; (09:07) STMT6; (09:12) STMT7; (09:35) STMT8; (09:35) STMT9; (09:35) STMT10; (09:36) STMT11; (09:39) COMMIT; (09:40)
When we receive an alert, we might believe that
STMT6 has been running for 35 minutes, as the transaction started at 09:00. The alert was triggered when COMMIT was executed at 09:40. In reality, STMT6 started at 09:12 and completed at 09:35, so its duration was only 23 minutes. The alert mechanism checks xact_start (transaction start) rather than query_start
(query start), leading to incorrect conclusions.This issue arises from the following query in the
postgres_exporter
code:SELECT pg_database.datname, tmp.state, tmp2.usename, tmp2.application_name, COALESCE(count, 0) AS count, COALESCE(max_tx_duration, 0) AS max_tx_duration FROM ( VALUES ('active'), ('idle'), ('idle in transaction'), ('idle in transaction (aborted)'), ('fastpath function call'), ('disabled') ) AS tmp(state) CROSS JOIN pg_database LEFT JOIN ( SELECT datname, state, usename, application_name, count(*) AS count, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float AS max_tx_duration FROM pg_stat_activity GROUP BY datname, state, usename, application_name ) AS tmp2 ON tmp.state = tmp2.state AND pg_database.datname = tmp2.datname
This code calculates the maximum transaction duration (
max_tx_duration
) based onxact_start
, which includes all queries within a transaction, rather than just the query that is running for the longest period.Feature Request:
To address this issue, we propose the following enhancements to the
postgres_exporter
and its metrics:Add a new metric: We should introduce a new metric for tracking the maximum state duration (
max_state_duration
), which would be based on thestate_change
timestamp rather thanxact_start
. This will help in identifying long-running queries accurately, without being influenced by the overall transaction duration.Update the query: Modify the query to calculate the
max_state_duration
, which will represent the longest period that a transaction has been in a specific state (such asidle in transaction
) since the state changed.The updated query would look like this:
SELECT pg_database.datname, tmp.state, tmp2.usename, tmp2.application_name, COALESCE(count, 0) AS count, COALESCE(max_tx_duration, 0) AS max_tx_duration, COALESCE(max_state_duration, 0) AS max_state_duration FROM (VALUES ('active'), ('idle'), ('idle in transaction'), ('idle in transaction (aborted)'), ('fastpath function call'), ('disabled')) AS tmp(state) CROSS JOIN pg_database LEFT JOIN ( SELECT datname, state, usename, application_name, count(*) AS count, MAX(EXTRACT(EPOCH FROM now() - xact_start))::float AS max_tx_duration, MAX(EXTRACT(EPOCH FROM now() - state_change))::float AS max_state_duration FROM pg_stat_activity GROUP BY datname, state, usename, application_name ) AS tmp2 ON tmp.state = tmp2.state AND pg_database.datname = tmp2.datname
Update the exporter metric definition: The new metric (
max_state_duration
) would need to be added to thepostgres_exporter
metric definition as follows:"pg_stat_activity": { map[string]ColumnMapping{ "datname": {LABEL, "Name of this database", nil, nil}, "state": {LABEL, "connection state", nil, semver.MustParseRange(">=9.2.0")}, "usename": {LABEL, "Name of the user logged into this backend", nil, nil}, "application_name": {LABEL, "Name of the application that is connected to this backend", nil, nil}, "count": {GAUGE, "number of connections in this state", nil, nil}, "max_tx_duration": {GAUGE, "max duration in seconds any active transaction has been running", nil, nil}, "max_state_duration": {GAUGE, "max state change duration in seconds any active transaction has been", nil, nil}, }, true, 0, },
Modify the alerting rules: With the introduction of
max_state_duration
, we can modify our existing alerts to be more accurate and based on the new metric:Idle in Transaction Alert: Trigger when a transaction has been in the "idle in transaction" state for more than 5 minutes without state change:
alert: PostgresqlIdleInTransaction expr: pg_stat_activity_max_state_duration{state="idle in transaction"} > 300
Long-Running Transactions Alert: Keep the current alert based on transaction duration (
max_tx_duration
):- alert: PostgresqlTransactionDuration expr: pg_stat_activity_max_tx_duration{application_name=~'.*', usename!=''} > 420
Query Duration Alert: Trigger a new alert for long-running queries based on the time spent in the "active" state:
- alert: PostgresqlQueryDuration expr: pg_stat_activity_max_state_duration{state="active"} > 360