pg_stat_monitor: Bucket is “Done” vs still being current/last

Description

How do we know when Bucket is “Done” vs still being current/last ?  For most observability needs we should look at the last completed bucket. 

 

(Ref: Doc shared by Peter)

https://docs.google.com/document/d/1ocWYHl1fLx5wF6xpdlXRXV5iqytNxL5uZerFPGngxvs/edit 

How to document

None

How to test

None

Attachments

2

Smart Checklist

Activity

Show:

Hamid Akhtar January 16, 2023 at 12:17 PM
Edited

naeem.akhter January 4, 2023 at 1:23 PM

We have added a new column bucket_done in PGSM, do documentation might need update.

naeem.akhter January 4, 2023 at 1:01 PM

Test case PR Merged. Verified.

Peter Zaitsev June 13, 2022 at 7:06 PM

Hm,

I'm starting to get confused here

1) The main use case of pg_stat_monitor is in very low overhead in memory storage. As such if PostgreSQL crashes it is expected all data will be lost. If we are persisting something we probably should not as this costs overhead which we do not want.

2) for PMM or any other monitoring application the goal is to consume only completed buckets. If you think about it as soon as bucket is "completed" it can be shipped with PMM and persisted because it is immutable. Makes logic pretty simple vs trying to do some timestamp path etc.

3) Having said that for some use cases you may want to see the recent completed queries too, so having data completely invisible until bucket is finished is not a good option.

4) Whatever magic we need to do with views we need to understand what PMM data capture is rather common operation (once a minute by default) and we need to make sure it is heavily optimized. Specifically we want to make sure PMM does not need to transfer the data it does not really need which means it needs to get all the data in completed buckets after bucket it successfully captured. This should be 1 bucket in 99.9% of the cases but in case of agent restart, network issues etc it might be more than one bucket.

Jiří Čtvrtka June 13, 2022 at 7:58 AM

If I got what we're trying to solve right, then my suggestion is:

Create another view (something like pg_stat_monitor_pending) where rows for unfinished bucket will be stored until bucket is done. Once bucket is done it's moved into pg_stat_monitor view, pending view is empty then and new bucket can be created.

Because right now I think we're mixing two things (apples and oranges) in one view since unfinished bucket is not the same as finished one. Advantage of two views is if PGSM crashes for some reason we know which buckets are OK and which are not.

Pending workflow is quite common in bank applications during processing transactions of money.

  What do you think? And others please feel free to comment on my proposal.

Done

Details

Assignee

Reporter

Time tracking

4h logged

Components

Fix versions

Affects versions

Priority

Smart Checklist

Created December 31, 2021 at 3:02 PM
Updated March 5, 2024 at 9:32 PM
Resolved January 4, 2023 at 1:01 PM