Issues
- Request for adjusting the postgres_exporter code for adding another metricPMM-13697
- Define default retention for system log tables or disable themPMM-13644talha.rizwan
- I/O error: Broken pipe, while writing to socketPMM-13517Aaditya Dubey
- gRPC error: rpc error: code = PermissionDenied desc = No Agent with IDPMM-13509Resolved issue: PMM-13509
- Example tab displaying just $ values rather than actual values for the slow queries in the QAN for PostgreSQL DBPMM-13195
- User-installed Altinity plugin for Clickhouse is uninstalled after PMM upgradePMM-13169
- Unable to Capture MongoDB metrics - dbStats, collStats, indexStats and topmetricsPMM-13163Resolved issue: PMM-13163
- QAN not showing up when adding a mongo nodePMM-13161Resolved issue: PMM-13161
- RDS API changes the data sent in the requestPMM-13157Alex Demidoff
- A blank line is accepted for the node addressPMM-13156
- Single "invalid" char permitted for address in v1/inventory/Nodes/AddPMM-13155
- MySQL/MariaDB Schemaname is empty in QANPMM-13148JiÅĆ Ätvrtka
- Ability to see database users/rolesPMM-13143
- Override Summary and Description when alert is createdPMM-13140
- ProxySQL Instance Summary dashboard is missing filtering by node namesPMM-13139
- Need options to disable specific collectors in the PMM UIPMM-13130
- QAN page search by..for queriesPMM-13124
- Query Response graphs has no buckets for <100msPMM-13123Resolved issue: PMM-13123Nurlan Moldomurov
- navigation between PMM pages has stopped persisting service names and timeframes when changing pagesPMM-13122Resolved issue: PMM-13122Matej Kubinec
- Add a way to ignore insecure certificates when adding a backup destinationPMM-13116
- Support for Application Load Balancer, which allows for AWS auto-managed SSL certificatesPMM-13110
- Alerts are not getting fired in PMM 2.41.2PMM-13109Resolved issue: PMM-13109
- Support of partial Certificates for Remote MySQL monitoringPMM-13103
- Incorrect description for the pmm_mysql_too_many_connections Alerting RulePMM-13102Agustin Gallego
- PMM agent: `log-level` option is not documentedPMM-13088
- QAN shows throttle message as a slow queryPMM-13081
- Allow support to modify the vmagent remoteWrite.maxDiskUsagePerURL settingPMM-13062
- PMM server-side logs should contain more than 1k linesPMM-13054Resolved issue: PMM-13054Alex Demidoff
- IAM Assume Role support on PMMPMM-13045
- Dashboard MongoDB Collections Overview - Feedback / Improvement requestPMM-13030Resolved issue: PMM-13030Santo Leto
- Dashboard "MongoDB Oplog Details" - filter is not correctly applied in chart "Oplog GB/Hour"PMM-13029Resolved issue: PMM-13029Yash Sartanpara
- pg_replication_lag parameter from custom queries calculated incorrectlyPMM-13018
- During upgrade pmm-agent tries to start while it's already runningPMM-13007Resolved issue: PMM-13007Nurlan Moldomurov
- NGINX logs contains "SSL_read" errorsPMM-13006
- It's not possible to add Postgres with pg_stat_monitor extension for pmm2-client < 2.41.2PMM-13001Resolved issue: PMM-13001JiÅĆ Ätvrtka
- AMI + OVF upgrade from versions older than 2.41.0 is unstablePMM-12998Resolved issue: PMM-12998
- Postgres_exporter.yml: no such file or directory error after adding postgresql with pmm-adminPMM-12995
- RDS exporter CPU metrics got collected without mode after adding more than 1 RDS instancePMM-12993
- Usage of pg_stat_monitor by default on Add Service pagePMM-12952Resolved issue: PMM-12952Matej Kubinec
- Web UI not available after pmm-server restart (AWS AMI) - nginx error 500PMM-12860Resolved issue: PMM-12860
- Due to the click house DB, the PMM server is under too much CPU strain.PMM-12438
- Pmm throwing alerts after removing it from the pmm.PMM-9589Resolved issue: PMM-9589
42 of 42
Hi Team,
Problem Statement:
We have encountered an issue with the
pg_stat_activity_max_tx_duration
alert, specifically for long-running queries and idle-in-transaction states. This issue affects our ability to accurately identify the root cause of long-running transactions and queries, as the current alerting mechanism leads to confusion regarding the actual duration of the transaction versus the query.Current Situation:
In the current implementation, when we receive an alert for long-running transactions, the alert is often triggered based on the entire transaction duration, which includes multiple queries. This results in false perceptions about the query durations. For example, consider the following transaction:
When we receive an alert, we might believe that
STMT6 has been running for 35 minutes, as the transaction started at 09:00. The alert was triggered when COMMIT was executed at 09:40. In reality, STMT6 started at 09:12 and completed at 09:35, so its duration was only 23 minutes. The alert mechanism checks xact_start (transaction start) rather than query_start
(query start), leading to incorrect conclusions.This issue arises from the following query in the
postgres_exporter
code:This code calculates the maximum transaction duration (
max_tx_duration
) based onxact_start
, which includes all queries within a transaction, rather than just the query that is running for the longest period.Feature Request:
To address this issue, we propose the following enhancements to the
postgres_exporter
and its metrics:Add a new metric: We should introduce a new metric for tracking the maximum state duration (
max_state_duration
), which would be based on thestate_change
timestamp rather thanxact_start
. This will help in identifying long-running queries accurately, without being influenced by the overall transaction duration.Update the query: Modify the query to calculate the
max_state_duration
, which will represent the longest period that a transaction has been in a specific state (such asidle in transaction
) since the state changed.The updated query would look like this:
Update the exporter metric definition: The new metric (
max_state_duration
) would need to be added to thepostgres_exporter
metric definition as follows:Modify the alerting rules: With the introduction of
max_state_duration
, we can modify our existing alerts to be more accurate and based on the new metric:Idle in Transaction Alert: Trigger when a transaction has been in the "idle in transaction" state for more than 5 minutes without state change:
Long-Running Transactions Alert: Keep the current alert based on transaction duration (
max_tx_duration
):Query Duration Alert: Trigger a new alert for long-running queries based on the time spent in the "active" state: