PXC cluster CrashLoopBackOff : Found 1 prepared transactions!

Description

Hi! One replica of three in my PXC 5.7 cluster is crash looping with the following error:

2021-11-22T08:57:27.796235Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.34-37 started; log sequence number 2264189188 2021-11-22T08:57:27.796281Z 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery. 2021-11-22T08:57:27.797037Z 0 [Note] Plugin 'FEDERATED' is disabled. 2021-11-22T08:57:27.809704Z 0 [Note] InnoDB: Starting recovery for XA transactions... 2021-11-22T08:57:27.809727Z 0 [Note] InnoDB: Transaction 12760 in prepared state after recovery 2021-11-22T08:57:27.809731Z 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021-11-22T08:57:27.809736Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021-11-22T08:57:27.809739Z 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021-11-22T08:57:27.809753Z 0 [Warning] WSREP: Discovered discontinuity in recovered wsrep transaction XIDs. Truncating the recovery list to 0 entries 2021-11-22T08:57:27.809757Z 0 [Note] WSREP: Last wsrep seqno to be recovered 2656 2021-11-22T08:57:27.809852Z 0 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021-11-22T08:57:27.809862Z 0 [ERROR] Aborting

I'm using the operator installed with the chart pxc-operator (chart version 1.9.1) and an instance installed with the chart pxc-db (chart version 1.9.1 and MySQL version 5.7-recommended with no extra conf).

Is this a bug? I think the operator should handle automatically this use case. I don't know why only 1 replica of 3 is crash looping while others are OK. Moreover, all the haproxy in front of PXC instances are unready, so there is no HA here.

Environment

None

Smart Checklist

Activity

Show:

Aaditya Dubey December 10, 2023 at 8:13 AM

Hi ,

Closing the report, no activity for a long!

Aaditya Dubey August 12, 2022 at 1:10 PM

Hi  ,

Thank you for the report.
Please let me know if issue is still persists.

Antoine Ozenne January 4, 2022 at 9:15 AM

The problem occurs frequently, making the instance unusable (because of the CrashLoopBackOff of all haproxy). Moreover, I have just returned from vacation and discovered that the issue is present on 2 of the 3 pxc pods this time. Will try to reproduce your fix, but what can I do to investigate?

Antoine Ozenne November 26, 2021 at 4:20 PM

Hi @Mykola Marzhan,

Thanks for the fix! I deleted the PVC first and the pod later. Then the pod restarted 3 times before to become ready. I checked in the pod, and all data are replicated, so all is fine for the fix!

To reproduce: the pod have to crash during a long transaction (about an hour in my case). So when the pod restart, it complains about the "pending" transaction.

I see the default value of innodb_flush_log_at_trx_commit is 0. Could this be the problem?

Mykola Marzhan November 26, 2021 at 12:36 PM

Hi

do you have any steps to reproduce?
possible fix - delete pod PVC first and delete Pod later (maybe delete pod two times if it stuck)

Incomplete

Assignee

Reporter

Priority

Created November 22, 2021 at 5:13 PM
Updated March 5, 2024 at 5:43 PM
Resolved December 10, 2023 at 8:13 AM

Flag notifications