pg_stat_monitor: Bucket is “Done” vs still being current/last
Description
How to document
How to test
Attachments
duplicates
is duplicated by
relates to
Smart Checklist
Activity
Hamid Akhtar January 16, 2023 at 12:17 PMEdited
Documentation PR: https://github.com/percona/pgsm-docs/pull/16
naeem.akhter January 4, 2023 at 1:23 PM
We have added a new column bucket_done in PGSM, do documentation might need update.
naeem.akhter January 4, 2023 at 1:01 PM
Test case PR Merged. Verified.
Peter Zaitsev June 13, 2022 at 7:06 PM
Hm,
I'm starting to get confused here
1) The main use case of pg_stat_monitor is in very low overhead in memory storage. As such if PostgreSQL crashes it is expected all data will be lost. If we are persisting something we probably should not as this costs overhead which we do not want.
2) for PMM or any other monitoring application the goal is to consume only completed buckets. If you think about it as soon as bucket is "completed" it can be shipped with PMM and persisted because it is immutable. Makes logic pretty simple vs trying to do some timestamp path etc.
3) Having said that for some use cases you may want to see the recent completed queries too, so having data completely invisible until bucket is finished is not a good option.
4) Whatever magic we need to do with views we need to understand what PMM data capture is rather common operation (once a minute by default) and we need to make sure it is heavily optimized. Specifically we want to make sure PMM does not need to transfer the data it does not really need which means it needs to get all the data in completed buckets after bucket it successfully captured. This should be 1 bucket in 99.9% of the cases but in case of agent restart, network issues etc it might be more than one bucket.
Jiří Čtvrtka June 13, 2022 at 7:58 AM
If I got what we're trying to solve right, then my suggestion is:
Create another view (something like pg_stat_monitor_pending) where rows for unfinished bucket will be stored until bucket is done. Once bucket is done it's moved into pg_stat_monitor view, pending view is empty then and new bucket can be created.
Because right now I think we're mixing two things (apples and oranges) in one view since unfinished bucket is not the same as finished one. Advantage of two views is if PGSM crashes for some reason we know which buckets are OK and which are not.
Pending workflow is quite common in bank applications during processing transactions of money.
What do you think? And others please feel free to comment on my proposal.
Details
Assignee
Anastasia AlexandrovaAnastasia AlexandrovaReporter
naeem.akhternaeem.akhterLabels
Time tracking
4h loggedComponents
Fix versions
Affects versions
Priority
High
Details
Details
Assignee
Reporter
Labels
Time tracking
Components
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist

How do we know when Bucket is “Done” vs still being current/last ? For most observability needs we should look at the last completed bucket.
(Ref: Doc shared by Peter)
https://docs.google.com/document/d/1ocWYHl1fLx5wF6xpdlXRXV5iqytNxL5uZerFPGngxvs/edit