Issues
- Request for adjusting the postgres_exporter code for adding another metricPMM-13697
- Update PMM documentation to mention minimum privileges required to monitor Postgres with both pg_stat_statement and pg_stat_monitorPMM-12963
- [BE] QAN for pg_stat_monitor works incorrectly if value of 'pg_stat_monitor.pgsm_bucket_time' different from defaultPMM-10197
- PostgreSQL Metrics Monitoring Failure on Aurora/RDSPMM-7741Resolved issue: PMM-7741
- Negative query's ID for Postgres queryPMM-4753Resolved issue: PMM-4753
- [SPIKE] LLVM custom packages from PerconaPKG-211Resolved issue: PKG-211naeem.akhter
- Spike / POC - define the cost estimation of installers solutionPKG-209Resolved issue: PKG-209stefan.vinasi
- pg_tde Beta1 needs to be rebuildPKG-138Resolved issue: PKG-138Muhammad Aqeel
- Update packaging scripts for PG 17PKG-132Resolved issue: PKG-132Muhammad Aqeel
- set_user extension build fails for PG 17 from REL4_0_1 tagPKG-131Resolved issue: PKG-131naeem.akhter
- Enable use of container secretsPKG-130surabhi.bhat
- Packaging and build consistency across different platformsPKG-129Resolved issue: PKG-129Muhammad Aqeel
- pg_repack build fails on ubuntu/debian platforms with PG 17 beta2PKG-128Resolved issue: PKG-128Muhammad Aqeel
- Build Ticket - TemplatePKG-127Resolved issue: PKG-127Evgeniy Patlan
- There are too many PERL packages when installing percona-ppg-server15PKG-126Muhammad Aqeel
- Do not send telemetry for PGPKG-125Alex Miroshnychenko
- ARM support for PostgreSQLPKG-124naeem.akhter
- 'pg_tde_change_key_provider' binary is missing in nightly Debian packagesPG-1454Resolved issue: PG-1454naeem.akhter
- Automatic stack trace for PGPG-1439
- Easier PG coredumpsPG-1438
- Work towards upstreaming extensible SMGRsPG-1432andreas.karlsson
- TDE: Integration with Entrust KeyControlPG-1407
- TDE: Integration with Fortanix KMSPG-1406
- TDE: Integration with Kubernetes Secrets ProviderPG-1405
- TDE: Integration with Google Cloud - Cloud Key Management ServicePG-1404
- TDE: Integration with Microsoft Azure Key VaultPG-1403
- TDE: Integration with AWS KMSPG-1402
- TDE integration with Okta KMS (KMIP based)PG-1386
- TDE integration with Akeyless KMS (KMIP based)PG-1385
- WAL performance enhancementsPG-1362
- Okta Universal Logout supportPG-1312
- TDE: Global keyringPG-1286
- PostgreSQL Quarterly Update Q2/2025PG-1285
- PostgreSQL Quarterly Update Q1/2025PG-1284
- TDE: Certify pg_tde with Thales HSM and Thales KMSPG-1265
- WAL fix/enhancementsPG-1262
- TDE: KMS config options for an offline databasePG-1249
- Access method enforcementPG-1231
- Add pgvectorscale and pgaiPG-1228Muhammad Aqeel
- OIDC made easier for PostgreSQLPG-1227
- Support of anonymization for PGPG-1226
- Automated documentation testingPG-1221
- Providing pg_tde for customers on older version(s)PG-1218
- TDE: Cipher configurationPG-1194
- Upstream contribution of the WAL extensibility improvementPG-1193
- Badges for PG productsPG-1192
- PG easier deployment: LLVM shippingPG-1103naeem.akhter
- PostgreSQL InstallatorsPG-1102Muhammad Aqeel
- Make Valkey use with PostgreSQL easierPG-1096
- Upstream contribution of SMGR improvementsPG-1094
50 of 73
Hi Team,
Problem Statement:
We have encountered an issue with the
pg_stat_activity_max_tx_duration
alert, specifically for long-running queries and idle-in-transaction states. This issue affects our ability to accurately identify the root cause of long-running transactions and queries, as the current alerting mechanism leads to confusion regarding the actual duration of the transaction versus the query.Current Situation:
In the current implementation, when we receive an alert for long-running transactions, the alert is often triggered based on the entire transaction duration, which includes multiple queries. This results in false perceptions about the query durations. For example, consider the following transaction:
When we receive an alert, we might believe that
STMT6 has been running for 35 minutes, as the transaction started at 09:00. The alert was triggered when COMMIT was executed at 09:40. In reality, STMT6 started at 09:12 and completed at 09:35, so its duration was only 23 minutes. The alert mechanism checks xact_start (transaction start) rather than query_start
(query start), leading to incorrect conclusions.This issue arises from the following query in the
postgres_exporter
code:This code calculates the maximum transaction duration (
max_tx_duration
) based onxact_start
, which includes all queries within a transaction, rather than just the query that is running for the longest period.Feature Request:
To address this issue, we propose the following enhancements to the
postgres_exporter
and its metrics:Add a new metric: We should introduce a new metric for tracking the maximum state duration (
max_state_duration
), which would be based on thestate_change
timestamp rather thanxact_start
. This will help in identifying long-running queries accurately, without being influenced by the overall transaction duration.Update the query: Modify the query to calculate the
max_state_duration
, which will represent the longest period that a transaction has been in a specific state (such asidle in transaction
) since the state changed.The updated query would look like this:
Update the exporter metric definition: The new metric (
max_state_duration
) would need to be added to thepostgres_exporter
metric definition as follows:Modify the alerting rules: With the introduction of
max_state_duration
, we can modify our existing alerts to be more accurate and based on the new metric:Idle in Transaction Alert: Trigger when a transaction has been in the "idle in transaction" state for more than 5 minutes without state change:
Long-Running Transactions Alert: Keep the current alert based on transaction duration (
max_tx_duration
):Query Duration Alert: Trigger a new alert for long-running queries based on the time spent in the "active" state: