Issues

Select view

Select search mode

 
50 of 73

Request for adjusting the postgres_exporter code for adding another metric

Description

Hi Team,

Problem Statement:

We have encountered an issue with the pg_stat_activity_max_tx_duration alert, specifically for long-running queries and idle-in-transaction states. This issue affects our ability to accurately identify the root cause of long-running transactions and queries, as the current alerting mechanism leads to confusion regarding the actual duration of the transaction versus the query.

Current Situation:

In the current implementation, when we receive an alert for long-running transactions, the alert is often triggered based on the entire transaction duration, which includes multiple queries. This results in false perceptions about the query durations. For example, consider the following transaction:

When we receive an alert, we might believe that STMT6 has been running for 35 minutes, as the transaction started at 09:00. The alert was triggered when COMMIT was executed at 09:40. In reality, STMT6 started at 09:12 and completed at 09:35, so its duration was only 23 minutes. The alert mechanism checks xact_start (transaction start) rather than query_start (query start), leading to incorrect conclusions.

This issue arises from the following query in the postgres_exporter code:

This code calculates the maximum transaction duration (max_tx_duration) based on xact_start, which includes all queries within a transaction, rather than just the query that is running for the longest period.

Feature Request:

To address this issue, we propose the following enhancements to the postgres_exporter and its metrics:

  1. Add a new metric: We should introduce a new metric for tracking the maximum state duration (max_state_duration), which would be based on the state_change timestamp rather than xact_start. This will help in identifying long-running queries accurately, without being influenced by the overall transaction duration.

  2. Update the query: Modify the query to calculate the max_state_duration, which will represent the longest period that a transaction has been in a specific state (such as idle in transaction) since the state changed.

    The updated query would look like this:

  1. Update the exporter metric definition: The new metric (max_state_duration) would need to be added to the postgres_exporter metric definition as follows:

  1. Modify the alerting rules: With the introduction of max_state_duration, we can modify our existing alerts to be more accurate and based on the new metric:

  • Idle in Transaction Alert: Trigger when a transaction has been in the "idle in transaction" state for more than 5 minutes without state change:

Long-Running Transactions Alert: Keep the current alert based on transaction duration (max_tx_duration):

Query Duration Alert: Trigger a new alert for long-running queries based on the time spent in the "active" state:

How to test

None

How to document

None

Details

Assignee

Reporter

Priority

Components

Needs QA

Yes

Needs Doc

Yes

Affects versions

Smart Checklist

Created January 24, 2025 at 5:42 AM
Updated 2 days ago

Activity

Show:

Bhargav Kamineni last week

CLS was done from my side a month back, I think Nurlan needs to sign the CLA, Please confirm if Iā€™m correct or let me know if something is needed from my side. Thank you

 

David Gonzalez March 5, 2025 at 7:33 PM

Hi , thank you for this request!

I was checking the latest status of the PR and it seems it is on hold because it is missing a CLA signature, please see:

Can you take a look?

Bhargav Kamineni February 10, 2025 at 10:45 AM

Thanks , Can you provide an estimated ETA for reviewing or completing this request? This is just to update the client.

Nurlan Moldomurov February 5, 2025 at 8:01 AM

Hi , thanks for the PR, but that one was created for upstream. I created another one for our fork

Bhargav Kamineni February 5, 2025 at 3:39 AM

Thanks for granting the access , I have created a pull request

Please check and let me know of any comments, thank you!

Loading...