Issues
- Backup cronJob does not have resources limitsK8SPXC-806Resolved issue: K8SPXC-806
- Deletion of pxc-backups object hangs if operator can't list objects from S3 bucketK8SPXC-805Resolved issue: K8SPXC-805Bulat Zamalutdinov
- Mark pxc container restarts in logs container outputK8SPXC-804natalia.marukovich
- I hope to adopt some questions about the use of pxc operatorK8SPXC-803Resolved issue: K8SPXC-803
- Cluster fail to replicate gzipped dataK8SPXC-802Resolved issue: K8SPXC-802
- Fix update and scale down concurrentK8SPXC-801Resolved issue: K8SPXC-801ege.gunes
- proxysql service nodeport bugK8SPXC-800Resolved issue: K8SPXC-800Bulat Zamalutdinov
- delete-the-unneeded-backup missing description on impact on s3 bucket backupK8SPXC-799Resolved issue: K8SPXC-799dmitriy.kostiuk
- After cluster got new certificates operator cannot pause cluster by his selfK8SPXC-797Resolved issue: K8SPXC-797ege.gunes
- S3 backup deletion doesn't delete PodsK8SPXC-796Resolved issue: K8SPXC-796Dmytro Zghoba
- Provide a way to easily change some settingsK8SPXC-795
- Flood of rotate information in logsK8SPXC-794Resolved issue: K8SPXC-794Slava Sarzhan
- Improve output of log collectorK8SPXC-793Resolved issue: K8SPXC-793Slava Sarzhan
- SmartUpdate for operatorK8SPXC-792Resolved issue: K8SPXC-792
- allow "sleep infinity" on non-debug imagesK8SPXC-791Resolved issue: K8SPXC-791Slava Sarzhan
- DR Replication - tune master retries for replication between two clustersK8SPXC-789Resolved issue: K8SPXC-789Slava Sarzhan
- mbind: Operation not permittedK8SPXC-788
- The cluster doesn't become ready after password for xtrabackup user is changedK8SPXC-787Resolved issue: K8SPXC-787Bulat Zamalutdinov
- Fix duplication in service handlingK8SPXC-786Resolved issue: K8SPXC-786George Kechagias
- Backup to S3 produces error messages even during successful backupK8SPXC-785Resolved issue: K8SPXC-785Slava Sarzhan
- Parameterize operator deployment nameK8SPXC-784Resolved issue: K8SPXC-784ege.gunes
- Do not allow 'root@%' user to modify the monitor/clustercheck usersK8SPXC-783Resolved issue: K8SPXC-783Dmytro Zghoba
- Backup stails in running state after garb ssl connection problemK8SPXC-782Slava Sarzhan
- Remove unused fileK8SPXC-781Resolved issue: K8SPXC-781Bulat Zamalutdinov
- spec.tls.issuerConf is not documentedK8SPXC-780Resolved issue: K8SPXC-780dmitriy.kostiuk
- Image pulled several timesK8SPXC-778
- logs is still spammed with DNS messagesK8SPXC-777Slava Sarzhan
- The custom mysqld config isn't checked in case of cluster updateK8SPXC-775Resolved issue: K8SPXC-775Slava Sarzhan
- Add common labels to serviceK8SPXC-772Resolved issue: K8SPXC-772Bulat Zamalutdinov
- Expose all fields supported in the CRD to the Helm chart for PXC-DBK8SPXC-771Resolved issue: K8SPXC-771Tomislav Plavcic
- CRD API version deprecatedK8SPXC-770Resolved issue: K8SPXC-770
- Operator reports "Unknown MySQL server host"K8SPXC-769Resolved issue: K8SPXC-769Lalit Choudhary
- Operator can not create 2nd instance on OpenShift 4.6.31K8SPXC-768Resolved issue: K8SPXC-768
- On demand backup hangs if it was created when the cluster was in 'initializing' stateK8SPXC-767Resolved issue: K8SPXC-767ege.gunes
- S3 Delete not working/stuckK8SPXC-766Resolved issue: K8SPXC-766
- Add ConfigMaps deletion for custom configurationsK8SPXC-765Resolved issue: K8SPXC-765Sergey Pronin
- Allow backups even if just a single node is availableK8SPXC-764Resolved issue: K8SPXC-764ege.gunes
- [BUG] Proxysql statefulset, PVC and services get mistakenly deleted when reading stale proxysql informationK8SPXC-763Resolved issue: K8SPXC-763ege.gunes
- Validating webhook not accepting scale operationK8SPXC-762Resolved issue: K8SPXC-762Dmytro Zghoba
- HAProxy container not setting explicit USER id, breaks runAsNonRoot security policy by defaultK8SPXC-761Resolved issue: K8SPXC-761Alex Miroshnychenko
- Document - new feature - skip TLS verification for backupsK8SPXC-760Resolved issue: K8SPXC-760dmitriy.kostiuk
- Allow to skip TLS verification for backup storageK8SPXC-758Resolved issue: K8SPXC-758Andrii Dema
- Manual Crash Recovery interferes with auto recovery even with auto_recovery: falseK8SPXC-757Resolved issue: K8SPXC-757Slava Sarzhan
- While cluster is paused - operator schedule backups.K8SPXC-756Resolved issue: K8SPXC-756ege.gunes
- Nothing happensK8SPXC-755
- kubectl delete takes very long timeK8SPXC-754Resolved issue: K8SPXC-754Slava Sarzhan
- Allow disabling TLS when taking backupsK8SPXC-752Resolved issue: K8SPXC-752
- Document - new feature - replication to another siteK8SPXC-751Resolved issue: K8SPXC-751
- ProxySQL can't connect to PXC if allowUnsafeConfiguration = trueK8SPXC-750Resolved issue: K8SPXC-750Andrii Dema
- Add tunable parameters for any timeout existing in the checksK8SPXC-749Resolved issue: K8SPXC-749Slava Sarzhan
- Install 'vim-minimal' for haproxy docker imageK8SPXC-746Resolved issue: K8SPXC-746Slava Sarzhan
- pxc operator robustness improvementK8SPXC-745Resolved issue: K8SPXC-745Lalit Choudhary
- Remove confusing error messages from the log of backupK8SPXC-743Resolved issue: K8SPXC-743Slava Sarzhan
- socat in percona/percona-xtradb-cluster-operator:1.7.0-pxc5.7-backup generates "E SSL_read(): Connection reset by peer"K8SPXC-742Resolved issue: K8SPXC-742Slava Sarzhan
- Document - cluster name limitationK8SPXC-740Resolved issue: K8SPXC-740dmitriy.kostiuk
- Operator doesn't scale for more than one podK8SPXC-739Resolved issue: K8SPXC-739
- Labels are not applied to ServiceK8SPXC-738Resolved issue: K8SPXC-738Andrii Dema
- proxysql-admin --syncusers rollbacks proxysql settings updatesK8SPXC-737Resolved issue: K8SPXC-737
- Change outdated cluster-version in GKE installation guideK8SPXC-736Resolved issue: K8SPXC-736dmitriy.kostiuk
- Include PXC namespace in the manual recovery commandK8SPXC-734Resolved issue: K8SPXC-734Mykola Marzhan
- CronJob backup retry is prevented by is prevented by Retention policyK8SPXC-733Resolved issue: K8SPXC-733dmitriy.kostiuk
- Capture cluster provisioning progress in the Custom ResourceK8SPXC-731Resolved issue: K8SPXC-731ege.gunes
- Rework statuses for a Custom ResourceK8SPXC-730Resolved issue: K8SPXC-730ege.gunes
- Cannot delete failed PerconaXtraDBClusterBackupK8SPXC-729Resolved issue: K8SPXC-729
- pxc-2 node was only alive node before delete pxc, re-created cluster is not recoveringK8SPXC-728Resolved issue: K8SPXC-728Slava Sarzhan
- cannot delete a pvc backup which had delete-s3-backup finalizer specifiedK8SPXC-726Resolved issue: K8SPXC-726Bulat Zamalutdinov
- [BUG] HAproxy statefulset and services get mistakenly deleted when reading stale `spec.haproxy.enabled`K8SPXC-725Resolved issue: K8SPXC-725Slava Sarzhan
- Update TLS certificatesK8SPXC-721Resolved issue: K8SPXC-721Bulat Zamalutdinov
- Create additional PITR testK8SPXC-720Resolved issue: K8SPXC-720Bulat Zamalutdinov
- Helm chart pxc-operator, Fail to upgrade from 0.1.14 to 0.1.15K8SPXC-719Resolved issue: K8SPXC-719
- Document - new feature - store custom configuration in SecretsK8SPXC-718Resolved issue: K8SPXC-718dmitriy.kostiuk
- [BUG] StatefulSet and PVC get mistakenly deleted when reading stale PerconaXtraDBCluster informationK8SPXC-716Resolved issue: K8SPXC-716
- PXC helm configuration sets equivalent limits on each containerK8SPXC-715Resolved issue: K8SPXC-715Andrii Dema
- Old versions' documentation should be available to readK8SPXC-714Resolved issue: K8SPXC-714dmitriy.kostiuk
- restore failed when set innode_undo_tablespacesK8SPXC-713
- PXC new cluster bootstrap problem. Replicas cannot reach primary view.K8SPXC-712Resolved issue: K8SPXC-712
- deleting a backup from S3 doesn't work if endpointUrl includes ending slashK8SPXC-710Resolved issue: K8SPXC-710Pavel Kasko
- after upgrade pxc status is still showing old versionK8SPXC-708Resolved issue: K8SPXC-708Pavel Kasko
- PITR is constantly re-uploading all binlogs after cluster restoreK8SPXC-707Resolved issue: K8SPXC-707Bulat Zamalutdinov
- Certificate renewal - PXC fails to restartK8SPXC-706Resolved issue: K8SPXC-706Slava Sarzhan
- HAProxy Stats Page is not displayedK8SPXC-705Resolved issue: K8SPXC-705
- Percona operator TPS and LatencyK8SPXC-703Resolved issue: K8SPXC-703
- cronjobs issue in cluster-wide in 1.20K8SPXC-701Resolved issue: K8SPXC-701Bulat Zamalutdinov
- haproxy test failing with no runtime for "docker" is configuredK8SPXC-699Resolved issue: K8SPXC-699Mike Storcheus
- Ability to set options for the primary database serverK8SPXC-698
- Add namespace support in copy-backup scriptK8SPXC-697Resolved issue: K8SPXC-697Mykola Marzhan
- Document - new feature - horizontal scalingK8SPXC-692Resolved issue: K8SPXC-692dmitriy.kostiuk
- Document - how to install through GCP MarketplaceK8SPXC-691Resolved issue: K8SPXC-691dmitriy.kostiuk
- GCP marketplace - show loadbalancer endpoint for applicationK8SPXC-690
- Unable to create a cluster with a specific timezoneK8SPXC-689
- Add possibility of defining env variables via CRK8SPXC-688Resolved issue: K8SPXC-688Slava Sarzhan
- restore not starting after failed restore on another clusterK8SPXC-687Resolved issue: K8SPXC-687Andrii Dema
- AWS backup restore to different cluster requires endpointURLK8SPXC-686Resolved issue: K8SPXC-686
- haproxy has multiple services but one setting for some service optionsK8SPXC-685Resolved issue: K8SPXC-685
- PITR backup doesn't allow specifying endpointUrl for AWS S3K8SPXC-684Resolved issue: K8SPXC-684
- Throw error if backupName and backupSource both specified in restore.yamlK8SPXC-683Resolved issue: K8SPXC-683Maksim Dudin
- Auto tuning sets wrong innodb_buffer_pool_sizeK8SPXC-682Resolved issue: K8SPXC-682ege.gunes
- operator crashes if non-existing storage name specified for PITRK8SPXC-681Resolved issue: K8SPXC-681Maksim Dudin
- When using WATCH_NAMESPACES operator fails to setup the webhookK8SPXC-680Resolved issue: K8SPXC-680
- missing entries in operator pod logK8SPXC-679Resolved issue: K8SPXC-679Pavel Kasko
pxc operator robustness improvement
Description
Environment
Smart Checklist
Details
Assignee
Lalit ChoudharyLalit ChoudharyReporter
朱礼程朱礼程Components
Priority
Medium
Details
Details
Assignee
Reporter
Components
Priority
Smart Checklist
Smart Checklist
Smart Checklist
Activity
Jira BotAugust 29, 2021 at 11:57 AM
Hello ,
It's been 52 days since this issue went into Incomplete and we haven't heard
from you on this.
At this point, our policy is to Close this issue, to keep things from getting
too cluttered. If you have more information about this issue and wish to
reopen it, please reply with a comment containing "jira-bot=reopen".
Jira BotAugust 21, 2021 at 11:56 AM
Hello ,
It's jira-bot again. Your bug report is important to us, but we haven't heard
from you since the previous notification. If we don't hear from you on
this in 7 days, the issue will be automatically closed.
Jira BotAugust 6, 2021 at 10:57 AM
Hello ,
I'm jira-bot, Percona's automated helper script. Your bug report is important
to us but we've been unable to reproduce it, and asked you for more
information. If we haven't heard from you on this in 3 more weeks, the issue
will be automatically closed.
Lalit ChoudharyJuly 8, 2021 at 10:16 AM
Hi
Thank you for the report and your inputs.
if the operator is crash or the required condition is err, the cluster will not auto recovery(such as when restore failed, the pxc size information will lose)。
In k8s, operator always push the state to the final state according the current state, it is not procedure oriented. The operator will become more stable if we use k8s thinkind.
In PXC-Operator version 1.7 and 1.8 there few improvements for auto-recovery .
New feature 1.7.0 and 1.8.0
: Add support for point-in-time recovery
: PXC cluster will now recover automatically from a full crash when Pods are stuck in CrashLoopBackOff status
: Operator can now automatically recover Percona XtraDB Cluster after the network partitioning
https://www.percona.com/doc/kubernetes-operator-for-pxc/ReleaseNotes/index.html
Apart from this if have further improvement suggestion, it would be better if you can add example use case and your expectation as an improvement.
Feel free to add a comment here.
Now the operator always sleep to meet the required condition, this lead to block. And if the operator is crash or the required condition is err, the cluster will not auto recovery(such as when restore failed, the pxc size information will lose)。
In k8s, operator always push the state to the final state according the current state, it is not procedure oriented. The operator will become more stable if we use k8s thinkind.
I am sorry that my english is poor, I don't know if anyone understand my idea. But i am glad to communicate with the operator team and make my effort to improve the program robustness. Any group or way i can chat to the team i want to know