Cluster fail to replicate gzipped data

Description

++Hi

 

I am running Percona Kubernetes operator 1.7.0 and I came across a issue multiple time where writer pod failed to replicate some data across, which result in broken cluster and that write pod dies. I have tried to delete that pod and recreate it sometimes brings it up and sometime it still fails with weird Corrupted data string. 

I have tried to check the logs on other 2 pods for the same timestamp when pxc-0 broke and i cant see any errors.

 

Kubernetes Status:

root@openair01 alir]# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/cluster1-haproxy-0 2/2 Running 422 32d
pod/cluster1-haproxy-1 2/2 Running 414 32d
pod/cluster1-haproxy-2 2/2 Running 1050 87d
pod/cluster1-pxc-0 2/3 Running 0 3h45m
pod/cluster1-pxc-1 3/3 Running 4 86d
pod/cluster1-pxc-2 3/3 Running 1 9d
pod/daily-backup-1624579200-cmfxm 0/1 Completed 0 14h
pod/percona-xtradb-cluster-operator-d65896bcb-tv8rj 1/1 Running 0 108d

 

Logs from broken Pod:

=========================================================

 

[0] pxc.cluster1-pxc-0.mysqld-error.log: [1623974459.667014052, {"log"=>"2021-06-18T00:00:59.666903Z 0 [Note] [MY-000000] [Galera] 2.0 (cluster1-pxc-2): State transfer to 0.0 (garb) complete."}]
[0] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.421204313, {"log"=>"2021-06-18T00:01:05.985776Z 0 [Note] [MY-000000] [Galera] (c8ffd51e-b4d5, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.215.5:4567 "}]
[1] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.422334131, {"log"=>"2021-06-18T00:01:06.422229Z 0 [Note] [MY-000000] [Galera] declaring 3dd15c7b-b945 at ssl://192.168.209.6:4567 stable"}]
[2] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.422338075, {"log"=>"2021-06-18T00:01:06.422295Z 0 [Note] [MY-000000] [Galera] declaring a8608e4c-bdfa at ssl://192.168.211.5:4567 stable"}]
[3] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.422339143, {"log"=>"2021-06-18T00:01:06.422319Z 0 [Note] [MY-000000] [Galera] forgetting 33be5785-8659 (ssl://192.168.215.5:4567)"}]
[4] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.422419705, {"log"=>"2021-06-18T00:01:06.422369Z 0 [Note] [MY-000000] [Galera] (c8ffd51e-b4d5, 'ssl://0.0.0.0:4567') turning message relay requesting off"}]
[5] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.424782266, {"log"=>"2021-06-18T00:01:06.424679Z 0 [Note] [MY-000000] [Galera] Node 3dd15c7b-b945 state primary"}]
[6] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426430504, {"log"=>"2021-06-18T00:01:06.426337Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node"}]
[7] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426433955, {"log"=>"view (view_id(PRIM,3dd15c7b-b945,283)"}]
[8] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426434825, {"log"=>"memb {"}]
[9] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426435480, {"log"=>" 3dd15c7b-b945,0"}]
[10] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426436024, {"log"=>" a8608e4c-bdfa,0"}]
[11] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426436537, {"log"=>" c8ffd51e-b4d5,0"}]
[12] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426437035, {"log"=>" }"}]
[13] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426437509, {"log"=>"joined {"}]
[14] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426437963, {"log"=>" }"}]
[15] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426438416, {"log"=>"left {"}]
[16] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426438904, {"log"=>" }"}]
[17] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426439358, {"log"=>"partitioned {"}]
[18] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426439848, {"log"=>" 33be5785-8659,0"}]
[19] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426440322, {"log"=>" }"}]
[20] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426440822, {"log"=>")"}]
[21] pxc.cluster1-pxc-0.mysqld-error.log: [1623974466.426441407, {"log"=>"2021-06-18T00:01:06.426396Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk"}]
[22] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.416311702, {"log"=>"2021-06-18T00:01:07.416212Z 0 [Note] [MY-000000] [Galera] forgetting 33be5785-8659 (ssl://192.168.215.5:4567)"}]
[23] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.416575121, {"log"=>"2021-06-18T00:01:07.416508Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3"}]
[24] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.416598594, {"log"=>"2021-06-18T00:01:07.416574Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID."}]
[25] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.454755196, {"log"=>"2021-06-18T00:01:07.454607Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 4812a63f-cfc8-11eb-969c-3359919c0dd6"}]
[26] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.456973731, {"log"=>"2021-06-18T00:01:07.456895Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 4812a63f-cfc8-11eb-969c-3359919c0dd6 from 0 (cluster1-pxc-1)"}]
[27] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457027423, {"log"=>"2021-06-18T00:01:07.456980Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 4812a63f-cfc8-11eb-969c-3359919c0dd6 from 1 (cluster1-pxc-2)"}]
[28] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457082776, {"log"=>"2021-06-18T00:01:07.457054Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 4812a63f-cfc8-11eb-969c-3359919c0dd6 from 2 (cluster1-pxc-0)"}]
[29] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457096882, {"log"=>"2021-06-18T00:01:07.457086Z 0 [Note] [MY-000000] [Galera] Quorum results:"}]
[30] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457097740, {"log"=>" version = 6,"}]
[31] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457098468, {"log"=>" component = PRIMARY,"}]
[32] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457098949, {"log"=>" conf_id = 6,"}]
[33] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457099429, {"log"=>" members = 3/3 (primary/total),"}]
[34] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457099916, {"log"=>" act_id = 259861458,"}]
[35] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457100497, {"log"=>" last_appl. = 259861149,"}]
[36] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457100997, {"log"=>" protocols = 2/10/4 (gcs/repl/appl),"}]
[37] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457101565, {"log"=>" vote policy= 0,"}]
[38] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457102072, {"log"=>" group UUID = 3bf7640a-8093-11eb-98d8-9eaec930f796"}]
[39] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457212449, {"log"=>"2021-06-18T00:01:07.457183Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173]"}]
[40] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457350233, {"log"=>"2021-06-18T00:01:07.457297Z 1 [Note] [MY-000000] [Galera] ####### processing CC 259861459, local, ordered"}]
[41] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457383621, {"log"=>"2021-06-18T00:01:07.457351Z 1 [Note] [MY-000000] [Galera] Maybe drain monitors from 259861458 upto current CC event 259861459 upto:259861458"}]
[42] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457400087, {"log"=>"2021-06-18T00:01:07.457379Z 1 [Note] [MY-000000] [Galera] Drain monitors from 259861458 up to 259861458"}]
[43] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457415171, {"log"=>"2021-06-18T00:01:07.457402Z 1 [Note] [MY-000000] [Galera] ####### My UUID: c8ffd51e-ce9c-11eb-b4d5-ff0aeb0e2b60"}]
[44] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457430104, {"log"=>"2021-06-18T00:01:07.457421Z 1 [Note] [MY-000000] [Galera] Skipping cert index reset"}]
[45] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457449857, {"log"=>"2021-06-18T00:01:07.457438Z 1 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5)"}]
[46] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457467741, {"log"=>"2021-06-18T00:01:07.457456Z 1 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 259861458 -> 259861459"}]
[47] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.457554363, {"log"=>"2021-06-18T00:01:07.457499Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed."}]
[48] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496095994, {"log"=>"2021-06-18T00:01:07.496001Z 1 [Note] [MY-000000] [Galera] ================================================"}]
[49] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496100932, {"log"=>"View:"}]
[50] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496101876, {"log"=>" id: 3bf7640a-8093-11eb-98d8-9eaec930f796:259861459"}]
[51] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496102534, {"log"=>" status: primary"}]
[52] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496103167, {"log"=>" protocol_version: 4"}]
[53] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496103893, {"log"=>" capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO"}]
[54] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496104709, {"log"=>" final: no"}]
[55] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496105155, {"log"=>" own_index: 2"}]
[56] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496105568, {"log"=>" members(3):"}]
[57] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496106051, {"log"=>" 0: 3dd15c7b-9164-11eb-b945-976cfaf75f5d, cluster1-pxc-1"}]
[58] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496106529, {"log"=>" 1: a8608e4c-ce9c-11eb-bdfa-2beec802720e, cluster1-pxc-2"}]
[59] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496106932, {"log"=>" 2: c8ffd51e-ce9c-11eb-b4d5-ff0aeb0e2b60, cluster1-pxc-0"}]
[60] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496107400, {"log"=>"================================================="}]
[61] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.496143656, {"log"=>"2021-06-18T00:01:07.496075Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification."}]
[62] pxc.cluster1-pxc-0.mysqld-error.log: [1623974467.664419475, {"log"=>"2021-06-18T00:01:07.662207Z 0 [Note] [MY-000000] [Galera] Member 1.0 (cluster1-pxc-2) synced with group."}]
[63] pxc.cluster1-pxc-0.mysqld-error.log: [1623974469.316289985, {"log"=>"2021-06-18T00:01:09.316063Z 1 [Note] [MY-000000] [Galera] Recording CC from group: 259861459"}]
[64] pxc.cluster1-pxc-0.mysqld-error.log: [1623974469.316590877, {"log"=>"2021-06-18T00:01:09.316454Z 1 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 259861150"}]
[65] pxc.cluster1-pxc-0.mysqld-error.log: [1623974469.316745061, {"log"=>"2021-06-18T00:01:09.316629Z 1 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 259860187"}]
[0] pxc.cluster1-pxc-0.mysqld-error.log: [1623974471.919678185, {"log"=>"2021-06-18T00:01:11.919567Z 0 [Note] [MY-000000] [Galera] cleaning up 33be5785-8659 (ssl://192.168.215.5:4567)"}]
[0] pxc.cluster1-pxc-0.mysqld-error.log: [1623974493.866719803, {"log"=>"2021-06-18T00:01:33.866577Z 0 [Warning] [MY-000000] [Galera] Failed to report last committed 3bf7640a-8093-11eb-98d8-9eaec930f796:259862076, -110 (Connection timed out)"}]
[0] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.238212243, {"log"=>"2021-06-18T00:18:56.235471Z 0 [Warning] [MY-000000] [Galera] failed to replay trx: source: c8ffd51e-ce9c-11eb-b4d5-ff0aeb0e2b60 version: 5 local: 1 flags: 1 conn_id: 1870951 trx_id: 134507796 tstamp: 1623975532667523604; state: EXECUTING:0->REPLICATING:747->CERTIFYING:3439->APPLYING:3677->COMMITTING:1307->COMMITTED:-1"}]
[1] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.238788847, {"log"=>"2021-06-18T00:18:56.236935Z 0 [Warning] [MY-000000] [Galera] Invalid state in replay for trx source: c8ffd51e-ce9c-11eb-b4d5-ff0aeb0e2b60 version: 5 local: 1 flags: 1 conn_id: 1870951 trx_id: 134507796 tstamp: 1623975532667523604; state: EXECUTING:0->REPLICATING:747->CERTIFYING:3439->APPLYING:3677->COMMITTING:1307->COMMITTED:-1 (FATAL)"}]
[2] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.239011629, {"log"=>" at galera/src/replicator_smm.cpp:replay_trx():1211"}]
[3] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.246329534, {"log"=>"2021-06-18T00:18:56.240787Z 1870951 [ERROR] [MY-000000] [WSREP] trx_replay failed for: 6, schema: OpenAIR_asset, query: update `OpenAIR_monitoring`.`network_switch_port_monitor` set `utilization_data` = 'x\�\��ݒ$9�$��\�\�\\s
�g0 _�#<\�9#$w�\�\�\�9\�w��g�#ҡ��\�WUS�M�[te�03���\������/��?������\�����\���\�����)Ϲ\����\�\��ϧ�Z�o���\�\�����\�\��\�_����g�����\�?�!�\�\��\��&���\�����������/��e)\�\�\�^� {
�|\����\��\��������\���O_�\�\�c�\�{�ݫ��\�g�\�/��O_\��\�%&+5�\�zȥ�\\�?�秕P��r2k1w�|\��q=ƀOL\�Kﵷ�?�\���\���\�?�ȟ�\��\����\�>�\���\�����Ƈ�9\�?��e��\�?�?�\�\�\Zz�A.)Ĩ@~=>�\\�\�\��d
M��b=�\��\�Ђ�#\�%����h�\��\��q\�\��5\�\����1n5bA\�\�\�e\�\�+��3\�5\�\�g�\��\�g�Gį!�▒▒zJ�\'��\�$��xX��\�/m��\0�\�\�C�\\S�r\'\�\�Қ\��{ɵ.\0\�\"\�V�;�_\�j߃p�J&vl%\�%\�8�||B��p�f?.]�`�\�òY��r\��>/\����
�؄�U.\�T��cZ*��[\�\�D/���a��:#\�)�@�\�\��\�g�G\��[

...................................�\��\�\0\�
�F$[�U\�b�K"}]
[4] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.246341240, {"log"=>"2021-06-18T00:18:56.243318Z 1870951 [ERROR] [MY-010119] [Server] Aborting"}]
[5] pxc.cluster1-pxc-0.mysqld-error.log: [1623975536.246342046, {"log"=>"2021-06-18T00:18:56.244676Z 1870951 [Note] [MY-000000] [WSREP] Initiating SST cancellation"}]
[6] pxc.cluster1-pxc-0.mysqld-error.log: [1623975538.253399557, {"log"=>"2021-06-18T00:18:58.252992Z 1870951 [Warning] [MY-000000] [Server] /usr/sbin/mysqld: Forcing close of thread 1871464 user: 'ix_dbuser'"}]
[7] pxc.cluster1-pxc-0.mysqld-error.log: [1623975538.253406618, {"log"=>"2021-06-18T00:18:58.253097Z 1870951 [Warning] [MY-000000] [Server] /usr/sbin/mysqld: Forcing close of thread 1871576 user: 'ix_dbuser'"}]

 

=========================================================

 

 

Please let me know if you need more information.

 

 

 

Environment

Percona Kubernetes Operator 1.7.0

Percona Xtradb Cluster : 8.0.21-12.1

HAPROXY: 1.7.0-haproxy

Data Inserted using PHP 7.2. Data is compressed using php gzip module. 

Column which i am inserting compressed data is MEDIUM BLOB

  

Smart Checklist

Activity

Show:

Lalit Choudhary October 14, 2021 at 12:16 PM

Hi

Thank you for the report.

I do not think it's an issue with the pxc-operator, Here pxc node is not able to reply transaction resulting in aborting node and it an exit from the cluster. 

[Warning] [MY-000000] [Galera] Failed to report last committed 3bf7640a-8093-11eb-98d8-9eaec930f796:259862076, -110 (Connection timed out)"}] [Warning] [MY-000000] [Galera] Invalid state in replay for trx source:

and the data you see it's part of the update query. 

 Still, it's good to upgrade to the latest PXC  and Operator version.

If you still think it's a bug please provide us details of why you think it is a bug and how we can reproduce it.

 

Sergey Pronin September 17, 2021 at 8:19 AM

 sorry for not coming back to you earlier.

Is there anything specific about your deployment?

Please share the cr.yaml and details about your Kubernetes cluster.

Also I would suggest you to test the latest version.

Not a Bug

Details

Assignee

Reporter

Priority

Smart Checklist

Created June 25, 2021 at 2:34 PM
Updated March 5, 2024 at 5:48 PM
Resolved October 14, 2021 at 12:17 PM

Flag notifications