Dead Tuples graph on PostgreSQL Vacuum Monitoring Dashboard displays invalid percentage
Description
How to test
Setup PMM , add Pg with Vacuum
How to document
-
AFFECTED CS IDs
Activity
saikumar.vs September 1, 2023 at 1:37 PM
Verified, pls see testcase section for further details.
Leonardo Bacchi Fernandes May 26, 2023 at 6:06 PM
Hello Aleksandar,
Sorry about that, and thank you for explaining! I saw the bug report marked as Invalid after your comment and thought you had marked it as such.
@Anton Bystrov could you please consider revisiting this bug? The problem is in the query being used, not the PMM display/configuration itself.
Best,
Leo
aleksandar.yanakiev May 26, 2023 at 11:59 AM
Hey Leonardo
All of this is up to Anton to figure out and improve, my task was just to check if the configuration and the behavior of pg12 instance is up-to-spec, so that's why I did added that in the comment. IMO definitely the graph needs either a change on the calculation or better explanation for the users to understand what metrics is actually showing.
Leonardo Bacchi Fernandes May 25, 2023 at 4:58 PM
Hello @aleksandar.yanakiev,
The graph is named dead tuples, and the percentage is 9998-10000%. The name "dead tuples" implies this is a percentage of the entire table that contains dead tuples. So, you should only be able to have 100% of the table as dead tuples (all tuples are marked as dead), right?
I guess the other option is this being the percentage of dead tuples over live tuples, which is confusing and unintuitive. In this case, would it mean there are 10x more dead tuples than live tuples in the table?
I think calculating this over the total size of the table would be best. For example, if you see 100% dead tuples in that graph, you'd think the table is composed entirely of dead tuples, but if you're calculating this based on live tuples, it means half the table is composed of dead tuples, which is far less intuitive (seeing 100% and understanding this is half the table).
What do you think?
aleksandar.yanakiev May 24, 2023 at 2:58 PM
I did double check the pg12 server on the pmmdemo side:
re-installing the packages `pmm-agent` and `percona-pg_stat_monitor12` (pmm-agent was updated but not the pg_stat_monitor)
drop the content of /var/lib/pgsql and re-init the DB
reconfigure the extension based on the latest config from https://docs.percona.com/pg-stat-monitor/
`DROP/CREATE EXTENSION pg_stat_monitor;`
double check the permissions of the dedicated monitoring user and make sure it’s working
delete and re-add the pmm service - `pmm-admin remove/add postgresql [SERVICE_NAME]`
There ware a couple of errors in the pmm-agent logs prior the above, which are all solved now:
May 24 10:24:21 pg12 pmm-agent[30325]: ERRO[2023-05-24T10:24:21.369+00:00] non default bucket time value is not supported, status changed to WAITING agentID=/agent_id/ca1876b6-4fc2-41dc-ab0b-62ead2bbf0ee component=agent-builtin type=qan_postgresql_pgstatmonitor_agent
+
May 24 11:13:10 pg12 pmm-agent[18128]: ERRO[2023-05-24T11:13:10.600+00:00] failed to get extended monitor status: failed to query pg_stat_monitor: pq: column pg_stat_monitor.rows does not exist agentID=/agent_id/ca1876b6-4fc2-41dc-ab0b-62ead2bbf0ee component=agent-builtin type=qan_postgresql_pgstatmonitor_agent
May 24 11:13:10 pg12 pmm-agent[18128]: WARN[2023-05-24T11:13:10.600+00:00] failed to get pgsm_bucket_time, wait time set on 60 seconds agentID=/agent_id/ca1876b6-4fc2-41dc-ab0b-62ead2bbf0ee component=agent-builtin type=qan_postgresql_pgstatmonitor_agent
May 24 11:13:10 pg12 pmm-agent[18128]: INFO[2023-05-24T11:13:10.600+00:00] Sending status: WAITING. agentID=/agent_id/ca1876b6-4fc2-41dc-ab0b-62ead2bbf0ee component=agent-builtin type=qan_postgresql_pgstatmonitor_agent
The most promising find was the fact that after stopping the sysbench load, the graph in PMM start showing `No data`, the moment I start the sysbench back, the graph in PMM start showing numbers in the range 9998-10000%. Most probably the artificial load from sysbench is generating this absurd (or may be not that absurd) values
Hope all of the above is somewhat helpful.
Details
Assignee
UnassignedUnassignedReporter
Leonardo Bacchi FernandesLeonardo Bacchi FernandesPriority
MediumComponents
Labels
Needs QA
YesNeeds Doc
NoPlanned Version/s
Fix versions
Story Points
2Affects versions
Details
Details
Assignee
Reporter
Priority
Components
Labels
Needs QA
Needs Doc
Planned Version/s
Fix versions
Story Points
Affects versions
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

The issue can be seen on the pmmdemo website. The Dead Tuples graph displays 10000% dead tuples, which is invalid.
Here is the link: https://pmmdemo.percona.com/graph/d/postgres_vacuum_monitoring/postgresql-vacuum-monitoring?from=now-12h&to=now&var-interval=$__auto_interval_interval&var-environment=All&var-node_name=pg12&var-service_name=pg12-postgresql&var-database=sbtest&var-work_mem=4194304&var-version=14.7&var-max_connections=100&var-shared_buffers=134217728&var-wal_buffers=4194304&var-wal_segment_size=16777216&var-maintenance_work_mem=67108864&var-block_size=8192&var-checkpoint_segments=&var-checkpoint_timeout=300&var-default_statistics_target=100&var-seq_page_cost=1&var-random_page_cost=4&var-effective_cache_size=4294967296&var-effective_io_concurrency=1&var-fsync=1&var-autovacuum=1&var-autovacuum_analyze_scale_factor=0.1&var-autovacuum_analyze_threshold=50&var-autovacuum_vacuum_scale_factor=0.2&var-autovacuum_vacuum_threshold=50&var-autovacuum_vacuum_cost_limit=1684514664000&var-autovacuum_vacuum_cost_delay=0.002&var-autovacuum_max_workers=3&var-autovacuum_naptime=60&var-autovacuum_freeze_max_age=200000000&var-logging_collector=1&var-log_min_duration_statement=1684513058000&var-log_duration=0&var-log_lock_waits=0&var-max_wal_senders=10&var-max_wal_size=1073741824&var-min_wal_size=83886080&var-wal_compression=0&var-max_worker_processes=8&var-max_parallel_workers_per_gather=2&var-max_parallel_workers=2&var-autovacuum_work_mem=1684514664000&var-autovacuum_multixact_freeze_max_age=400000000&var-cluster=All&var-replication_set=All&var-node_type=All&var-service_type=All&var-username=All&var-schema=All&orgId=1&refresh=1m&var-tablename=sbtest1