CPU Utilization Graph for RDS instances is not marching what cloudwatch reports
General
Escalation
General
Escalation
Description
How to test
Add an RDS instance to PMM.
Generate load on the instance to increase CPU usage.
Compare the data shown on Cloudwatch vs the data for CPU usage in PMM.
How to document
None
Attachments
11
Activity
Show:

Nailya Kutlubaeva May 23, 2023 at 1:21 PM
Verifed
PMM:
Cloudwatch:

Aaditya Dubey March 1, 2023 at 4:38 PM
Hi ,
Thank you for the report.
Report issue can be repeated in 2.33, Please check the attached screenshots.
However PMM 2.35 looks unaffected with the issue. please check the behaviour in 2.35 and let us know.

uday.rajarapu March 1, 2023 at 4:19 AM
HI Platform Team,
Please check this issue and let us know your response to resolve for the customer.
Regards,
Uday Rajarapu
Managed Services, Percona.
Done
Details
Details
Assignee

Reporter

Priority
Components
Labels
Needs QA
Yes
Needs Doc
No
Planned Version/s
Fix versions
Story Points
3
Environment
PMM 2.33
RDS MariaDB 5.5
Smart Checklist
Open Smart Checklist
Smart Checklist

Open Smart Checklist
Created February 23, 2023 at 11:03 PM
Updated March 5, 2024 at 11:56 PM
Resolved May 24, 2023 at 8:22 AM
A client reported high CPU utilization on one of their RDS instances, and upon checking on PMM we could see that the CPU utilization in PMM is nearly 0%.
The client shared the graph for the console and it indeed doesn't match. I checked on other environments and we have the same issue.
I'm attaching 3 images:
RDS Cloudwatch: It shows an Utilization from around 20-55%.
CPU Utilization PMM: It shows an utilization always less than 25%.
PMM - CPU Utilization - TOP USAGE (OS Overview): Max usage of 45.60% which looks more accurate given the cloudwatch graphs.
Note that there is a time mismatch on those graphs because we added the node to PMM at 14:45 PST, but still it is enough evidence to give us an idea of the CPU usage reported at around that time.
I checked environments for other environments and the history is similar, there is data mismatch where PMM is reporting way less CPU utilization than CloudWatch.
In theory, the information should be the same as PMM collects metrics information from Cloudwatch, isn't it?
I tried changing the resolution to 5 mins as suggested on but the info still the data didn't match
There is concern that PMM is not showing reliable information.