While testing the backup and restore process of K8SPXC I noticed that there was no way (at least that I could find) to manage the CPU/Memory resources of the restore job.
While that is a sane default to some extent, in this case because of compression and low CPU resources, the time to restore is multiple times higher than necessary. If this happened in production I would likely want at least the CPU resources of that pod to be completely unbound, so as to restore service ASAP.
I'm not entirely certain of what the right way to spec that would be; atm I would most likely go and massively raise the CPU requirements on my PXC nodes to affect it, but that's somewhat suboptimal (and reliant on implementation details of the current version of the operator). Hopefully you will agree and have ideas of how to make this more convenient cleanly.
Environment
None
Smart Checklist
Activity
Tomislav Plavcic June 28, 2023 at 8:30 AM
Please document the behaviour from Andrii comment. Thanks!
Tomislav Plavcic June 27, 2023 at 5:55 PM
Edited
Ok, makes sense! Thanks for checking!
Andrii Dema June 27, 2023 at 2:17 PM
, it's intended to be like this. Currently, we have an env variable XB_USE_MEMORY in the job. This env variable is set to 75% of the resources.limits (or resources.requests if limits are not set). Because of it, we set resources.requests to be the same as resources.limits to avoid the case when XB_USE_MEMORY can have bigger value than resources.requests.
Tomislav Plavcic June 26, 2023 at 2:54 PM
I tried to specify resources in PXC restore object like this:
but this is what I got in the restore job:
whatever I specify it seems to use limits also for requests.
Slava Sarzhan May 22, 2023 at 10:08 AM
thank you for the task. We have added the possibility of specifying resources for restore object. It will be available in the next PXCO release.
While testing the backup and restore process of K8SPXC I noticed that there was no way (at least that I could find) to manage the CPU/Memory resources of the restore job.
After looking around a bit, I found https://github.com/percona/percona-xtradb-cluster-operator/blob/0176ba569a606f3ac08881839e6fa11de405f41f/pkg/pxc/backup/restore.go#L427 which I believe is currently how those come about (ie match individual PXC nodes).
While that is a sane default to some extent, in this case because of compression and low CPU resources, the time to restore is multiple times higher than necessary. If this happened in production I would likely want at least the CPU resources of that pod to be completely unbound, so as to restore service ASAP.
I'm not entirely certain of what the right way to spec that would be; atm I would most likely go and massively raise the CPU requirements on my PXC nodes to affect it, but that's somewhat suboptimal (and reliant on implementation details of the current version of the operator). Hopefully you will agree and have ideas of how to make this more convenient cleanly.