Due to the click house DB, the PMM server is under too much CPU strain.
Description
How to test
Step 1:
Start PMM 2.41.2 using docker:
Step 2:
Add MySQL service to PMM:
Step 3:
Start sysbench read/write load on MySQL instance:
Step 4:
Open PMM in Browser and navigate to QAN
Step 5:
Open QAN for 30/60/90 days in 3 different sessions on the browser.
Step 6:
Run htop
command to check the CPU usage:
I observed that CPU usage exceeds 200% for ClickHouse DB when using just a single service, so if Services are increased and go beyond 50, CPU stress will dramatically increase.
Please note I tested using just a single MySQL service, which shows the impact. Also, it starts showing Unknown Error/ Internal Error. QAN also becomes somewhat unresponsive. Please check the attached screenshot.
How to document
Attachments
Activity
Naresh May 6, 2024 at 8:36 AM
Thanks for the update.
Aaditya Dubey May 6, 2024 at 8:17 AM
Hi
We are unable to provide any ETA for fixing the issue due to several factors:
A complication of the fix/Feature.
Possibility to introduce new bugs or break existing applications.
Other bug fixes that cannot wait may influence the speed of the engineering work and might delay resolution.
There is always a chance that a fix or feature may break other parts of the code, and fixing it in the GA version can be undesirable.
Due to this, we cannot provide ETA right now, but we can guarantee that the team is actively working on the bug/Feature.
Naresh May 6, 2024 at 7:14 AM
Any update on the below?
Nurlan Moldomurov April 10, 2024 at 8:50 AM(edited)
could you add information about your instance? CPU, RAM, etc
Aaditya Dubey April 8, 2024 at 12:46 PM(edited)
Hi
Thank you for the suggestions on repeating the issue.
I observed that by adding a single MySQL service, the CPU goes to 200% for ClickHouse DB; after running it for some time, I see the below error/warning as well:
Sending this report to engineering for investigation and updates.
Due to the click house DB, the PMM server is under too much CPU strain. On the QAN dashboard, when selecting 30 days, 60 days, or 90 days of data, we see excessive CPU utilisation and the GUI becomes unresponsive.