Issues

Select view

List view

Detail view

Select search mode

Basic

JQL

DBaas Pod running out of memory with PXC
PXC-4047
Resolved issue: PXC-4047
PMM DBaaS edit db cluster broken
PMM-11428
Resolved issue: PMM-11428
[DBaaS] Internal Server Error on creating multiple DB clusters
PMM-11361
Resolved issue: PMM-11361
DBaaS: API key missing after move to OLM
PMM-11333
Resolved issue: PMM-11333
[DBaaS] DB type can not be changed to MongoDB while creating DB cluster
PMM-11311
Resolved issue: PMM-11311
Default values in PMM DBaaS in Operators
PMM-11183
Resolved issue: PMM-11183
DBaaS: list of DB clusters doesn't load if one of my k8s clusters is not responding
PMM-11121
Resolved issue: PMM-11121
[DBaaS] [FE] Ability to create single psmdb node cluster
PMM-10627
Resolved issue: PMM-10627
Improve the response speed for the List of DB Clusters created with DBaaS
PMM-9682
Resolved issue: PMM-9682
Support of large scale environments
PMM-9681
Resolved issue: PMM-9681
[DBaaS] resource calculator fixes
PMM-7918
Resolved issue: PMM-7918
DBaaS API
PMM-7175
Resolved issue: PMM-7175
Ability to ignore specific annotations for k8s Service objects
K8SPXC-1082
Resolved issue: K8SPXC-1082
Ability to ignore specific annotations for k8s Service objects
K8SPSMDB-824
Resolved issue: K8SPSMDB-824
When Changing to allowUnsafeConfigurations: true cluster goes to failures and mongos does not get to Ready state
K8SPSMDB-786
Resolved issue: K8SPSMDB-786
A new design for REST API for DBaaS
EVEREST-52
Resolved issue: EVEREST-52
[DBaaS] Create DB Cluster : Network and security
EVEREST-22
[DBaaS] Can't load imdb dataset to pxc cluster running under dbaas on EKS
EVEREST-12

18 of 18

DBaas Pod running out of memory with PXC

Done

General

Escalation

General

Escalation

Description

Hi,

Summary

If you create a 3 node PXC cluster and you start myloader or loading a bigger dataset the pod will run out of memory and be killed.

More details can be find in this doc: https://docs.google.com/document/d/1EYdnqyxmRrgtOAQUQDdF_FYKvGAwUlxHQx-YILy02wQ/edit#

Reproducing

We were able to reproduce by loading back a backup with myloader and also by using the public imdb database just simply loading the sql files.

Notes

I think we are facing this issue: https://github.com/kubernetes/kubernetes/issues/43916

But it only happens with PXC if I disable the Galera plugin on the same pod it only runs a stand alone MySQL I was not able to reproduce the issue. It only happens when Galera is enabled I think it is because Gcache and the fact how k8s calculating used memory what you can see in the github ticket above.

Environment

None

Attachments

Linked issues

Smart Checklist

Details
Assignee
Kamil Holubicki
Reporter
Tibor Korocz (Percona)
Labels
dbaas-gagas_needsplatform
Needs QA
Yes
Affects versions
8.0.29-21 (Q2 2022)
Priority
Medium

Smart Checklist

Created September 27, 2022 at 2:01 PM

Updated March 6, 2024 at 8:50 PM

Resolved March 1, 2023 at 5:02 PM

Activity

Show:

Kamil HolubickiMarch 1, 2023 at 5:02 PM

My understanding is that there are no more mysteries in this matter, everything is clear, and we all know what and why happens and how to solve it, so closing this ticket.

Kamil HolubickiDecember 7, 2022 at 11:20 AM

The following things are related to DBaaS.

Problem 1: When using global wsrep_trx_fragment_size/wsrep_trx_fragment_unit everything works fine, but when using session variables, the pod is OOM killed

Problem 2: During the load, it is visible that the client reconnects to the server

Problem 3: After c.a. 60sec of client inactivity, the next query causes reconnection.

Problem 4: Calculation of InnoDB Buffer Pool size (and maybe max_connections parameter). Right now for 2G pod, BP is set to 1G. gcache.size is 600M. Observed Pod memory consumption is 2G, so we are at the limit boundaries. Any memory pressure on the node could cause OOM kill of the pod. I think we need a smaller BP to be compliant with the calculations explained in the previous comment.

Conclusion 1: Problem 1 is caused by Problem 2. When the client is reconnected, session variables set previously are lost, so we continue without streaming replication (we go back to the original state as we were at the beginning of this ticket). There are 2 possible solutions:

Use global variables
Set local and global variables before load, restore to defaults after load

Conclusion 2: Problem 3 is caused by the HAProxy setup. Adding the following to db config solves the problem

However, it does not solve the problem of load vs session variables problem. There still may occur network failures which will cause reconnections.

Denys KondratenkoDecember 7, 2022 at 10:58 AM

could you please provide summary of the recent findings from slack.

volunteered to provide different recommended configurations for different types of workloads that should prevent OOM. Could you also check https://jira.percona.com/browse/K8SPXC-441 and also provide recommendation for that corner case where there is a low mem available.

Kamil HolubickiNovember 30, 2022 at 11:02 AM

I talked to on Slack and I think it is worth documenting it for the future:

Let me summarize what we've learned so far. That will be good guidance.

1. We've got the following significant memory consumers
- (A) Buffer Pool
- (B) WriteSet Cache off pages
- (C) GCache Ring Buffer
- (D) GCache off pages
- (E) MySql allocations
2. (A) and (C) are static/one-time allocations with defaults:
- (A) 128MB
- (C) 128MB
3. Lagre transactions cause OOM because of (B) and (D).
4. We should avoid (B) by setting wsrep_trx_fragment_unit='byte' wsrep_trx_fragment_size=3670016. This way large transactions will be chunked into 3.5M chunks and streamed across the cluster while the transaction is still ongoing.
5. We should avoid (D) by setting large enough (C).
- if there are not many simultaneous write transactions, the default may be enough
- if there are many simultaneous transactions, we should increase (C). Let's say 151 connections (default), 4M chunk => 600M. Now we need the previous chunk in (C) to be present, so this gives the rough estimate 1.2G.
6. For (E) we need to do tests and see how this behaves. My tests with one connection showed that it is c.a. 600MB
7. So our memory demand is M = (A) + (C) + (E)

If we go with (A) being 70% of the memory available to the pod, we've got:
**

Small:
(A) = 1.4G => (C) + (E) = 600M
As it is Small instance we can probably assume we will not have many parallel writers, so default (C) should be enough however, we still have no space for (E), so (A) should be decreased.

Medium:
(A) = 5.4G => (C) + (E) = 2.4G
I think we should expect parallel writers here, so we should increase (C), let's say to 1G, so we have 1.4G left. Should be enough, but, again, we should do testing with simultaneous write workload and different transactions/row sizes (do not confuse with wsrep_trx_fragment_size - this i 3.5M always)

Large:
(A) = 22.4G => (C) + (E) = 9.6G
Having even 2G (C) we are safe (depending on how (E) behaves - again, to be tested)

Another perspective
All we do here so far is considered the case of loading data which happens in huge transactions. Is it always the case? If you don't do this, and you don't do huge (parallel) writes, here are knobs you can manipulate:

1. wsrep_provider_options="gcache.size=N" - the bigger, the better, as it affects the ability of the node for being a good donor for IST, but this is a one-time allocation, never freed. So maybe a huge amount of memory is not needed for (C)? On the other side, if the writeset does not fit into (C) (precisely: is bigger than (C)/2), (D) is created
2. wsrep_trx_fragment_size=N - maybe it is not bad if WriteSet Cache pages are created sometimes? If we've got just a few write transactions and big enough (C) to not create (D) it should not be bad.
3. gcache.page_size - the size of a single page of (D)

And let me stress the following out again:

Right now we know how the system behaves with a single writer, but we need to test it with a parallel write workload!

Sergey ProninNovember 29, 2022 at 6:55 AM

just FYI - I tried to reproduce it on our new PS operator with Group Replication and it is not reproducible. Memory consumption stays flat and limited by innodb_buffer, no OOMs.

Issues

DBaas Pod running out of memory with PXC

Description

Environment

Attachments

Linked issues

blocks

is duplicated by

relates to

Smart Checklist

DetailsAssigneeKamil HolubickiKamil HolubickiReporterTibor Korocz (Percona)Tibor Korocz (Percona)Labelsdbaas-gagas_needsplatformNeeds QAYesAffects versions8.0.29-21 (Q2 2022)PriorityMedium

Details

Assignee

Reporter

Labels

Needs QA

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Activity

Kamil HolubickiMarch 1, 2023 at 5:02 PM

Kamil HolubickiDecember 7, 2022 at 11:20 AM

Denys KondratenkoDecember 7, 2022 at 10:58 AM

Kamil HolubickiNovember 30, 2022 at 11:02 AM

Sergey ProninNovember 29, 2022 at 6:55 AM

Details
Assignee
Kamil Holubicki
Reporter
Tibor Korocz (Percona)
Labels
dbaas-gagas_needsplatform
Needs QA
Yes
Affects versions
8.0.29-21 (Q2 2022)
Priority
Medium

Smart Checklist