PXC cluster CrashLoopBackOff : Found 1 prepared transactions!

General

Escalation

General

Escalation

Description

Hi! One replica of three in my PXC 5.7 cluster is crash looping with the following error:

2021-11-22T08:57:27.796235Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.34-37 started; log sequence number 2264189188
2021-11-22T08:57:27.796281Z 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
2021-11-22T08:57:27.797037Z 0 [Note] Plugin 'FEDERATED' is disabled.
2021-11-22T08:57:27.809704Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-11-22T08:57:27.809727Z 0 [Note] InnoDB: Transaction 12760 in prepared state after recovery
2021-11-22T08:57:27.809731Z 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-11-22T08:57:27.809736Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-11-22T08:57:27.809739Z 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-11-22T08:57:27.809753Z 0 [Warning] WSREP: Discovered discontinuity in recovered wsrep transaction XIDs. Truncating the recovery list to 0 entries
2021-11-22T08:57:27.809757Z 0 [Note] WSREP: Last wsrep seqno to be recovered 2656
2021-11-22T08:57:27.809852Z 0 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2021-11-22T08:57:27.809862Z 0 [ERROR] Aborting

I'm using the operator installed with the chart pxc-operator (chart version 1.9.1) and an instance installed with the chart pxc-db (chart version 1.9.1 and MySQL version 5.7-recommended with no extra conf).

Is this a bug? I think the operator should handle automatically this use case. I don't know why only 1 replica of 3 is crash looping while others are OK. Moreover, all the haproxy in front of PXC instances are unready, so there is no HA here.

Environment

None

Smart Checklist

Activity

Show:

Aaditya Dubey December 10, 2023 at 8:13 AM

Hi @Antoine Ozenne ,

Closing the report, no activity for a long!

Aaditya Dubey August 12, 2022 at 1:10 PM

Hi @Antoine Ozenne ,

Thank you for the report.
Please let me know if issue is still persists.

Antoine Ozenne January 4, 2022 at 9:15 AM

The problem occurs frequently, making the instance unusable (because of the CrashLoopBackOff of all haproxy). Moreover, I have just returned from vacation and discovered that the issue is present on 2 of the 3 pxc pods this time. Will try to reproduce your fix, but what can I do to investigate?

Antoine Ozenne November 26, 2021 at 4:20 PM

Hi @Mykola Marzhan,

Thanks for the fix! I deleted the PVC first and the pod later. Then the pod restarted 3 times before to become ready. I checked in the pod, and all data are replicated, so all is fine for the fix!

To reproduce: the pod have to crash during a long transaction (about an hour in my case). So when the pod restart, it complains about the "pending" transaction.

I see the default value of innodb_flush_log_at_trx_commit is 0. Could this be the problem?

Mykola Marzhan November 26, 2021 at 12:36 PM

Hi @Antoine Ozenne

do you have any steps to reproduce?
possible fix - delete pod PVC first and delete Pod later (maybe delete pod two times if it stuck)

Resize issue view side panel

Incomplete

Assignee

Unassigned

Reporter

Antoine Ozenne

Priority

Medium

Created November 22, 2021 at 5:13 PM

Updated March 5, 2024 at 5:43 PM

Resolved December 10, 2023 at 8:13 AM

PXC cluster CrashLoopBackOff : Found 1 prepared transactions!

Description

Environment

Smart Checklist

Activity

Aaditya Dubey December 10, 2023 at 8:13 AM

Aaditya Dubey August 12, 2022 at 1:10 PM

Antoine Ozenne January 4, 2022 at 9:15 AM

Antoine Ozenne November 26, 2021 at 4:20 PM

Mykola Marzhan November 26, 2021 at 12:36 PM

Details

Assignee

Reporter

Priority

Smart Checklist

Flag notifications

Something's gone wrong

Something's gone wrong