PXC node crash due to metadata lock issue with TOI primary key change

Description

On 2 out of 3 nodes of my cluster, there were crashes (just once) on Percona 8.0.35. Both crashes occurred during primary key changes in TOI. Both crashes occurred in the metadata lock area. There were no 'assertion failure' messages with either of the crashes.

Crash 1 (ALTER DATABASE ... CHARACTER SET & COLLATE):

2024-07-03T07:35:37.514055Z 0 [Note] [MY-000000] [Galera] Deleted page /var/lib/mysql/gcache.page.000173 2024-07-03T07:41:10.281484Z 1328758 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2024-07-03T07:41:10Z UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=82e0381828780a878c13947ae9217abad3e30d93 Server Version: 8.0.35-27.1 Percona XtraDB Cluster (GPL), Release rel27, Revision 84d9464, WSREP version 26.1.4.3, wsrep_26.1.4.3 Thread pointer: 0x7fc954d42240 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fd4adc77330 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x55b9b6823ce1] /usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x55b9b584eb2f] /usr/sbin/mysqld(handle_fatal_signal+0xd8) [0x55b9b584ec18] /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fd4ce75b520] /usr/sbin/mysqld(wsrep_handle_mdl_conflict(MDL_context const*, MDL_ticket*, MDL_key const*)+0x88) [0x55b9b586d598] /usr/sbin/mysqld(MDL_lock::can_grant_lock(enum_mdl_type, MDL_context const*) const+0x8de) [0x55b9b553433e] /usr/sbin/mysqld(MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**)+0x4d8) [0x55b9b55380f8] /usr/sbin/mysqld(MDL_context::acquire_lock(MDL_request*, unsigned long)+0xae) [0x55b9b55385de] /usr/sbin/mysqld(MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >, unsigned long)+0xf6) [0x55b9b5539c06] /usr/sbin/mysqld(lock_table_names(THD, Table_ref*, Table_ref*, unsigned long, unsigned int, Prealloced_array<MDL_request*, 1ul>)+0x8c8) [0x55b9b5620908] /usr/sbin/mysqld(mysql_alter_db(THD, char const*, HA_CREATE_INFO*)+0x305) [0x55b9b565bfb5] /usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x5408) [0x55b9b56c1478] /usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x610) [0x55b9b56c29c0] /usr/sbin/mysqld(+0x1269ff5) [0x55b9b56c2ff5] /usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x39a7) [0x55b9b56c77d7] /usr/sbin/mysqld(do_command(THD*)+0x204) [0x55b9b56c7e84] /usr/sbin/mysqld(+0x13e5418) [0x55b9b583e418] /usr/sbin/mysqld(+0x2872e99) [0x55b9b6ccbe99] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd4ce7adac3] /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7fd4ce83f850] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fc9550c5030): ALTER DATABASE schema_a CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci Connection ID (thread ID): 1328758 Status: NOT_KILLED You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash. Log of wsrep recovery (--wsrep-recover):

Crash 2 (ALTER TABLE ... COLLATE):

2024-07-02T12:36:47.735676Z 0 [Note] [MY-000000] [Galera] Deleted page /var/lib/mysql/gcache.page.000227 2024-07-02T16:35:09.252455Z 11 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2024-07-02T16:35:09Z UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=82e0381828780a878c13947ae9217abad3e30d93 Server Version: 8.0.35-27.1 Percona XtraDB Cluster (GPL), Release rel27, Revision 84d9464, WSREP version 26.1.4.3, wsrep_26.1.4.3 Thread pointer: 0x7f75e1c00e00 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f807d71b350 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x559d9d274ce1] /usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x559d9c29fb2f] /usr/sbin/mysqld(handle_fatal_signal+0xd8) [0x559d9c29fc18] /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f80b370e520] /usr/sbin/mysqld(wsrep_handle_mdl_conflict(MDL_context const*, MDL_ticket*, MDL_key const*)+0xa2b) [0x559d9c2bef3b] /usr/sbin/mysqld(MDL_lock::can_grant_lock(enum_mdl_type, MDL_context const*) const+0x8de) [0x559d9bf8533e] /usr/sbin/mysqld(MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**)+0x4d8) [0x559d9bf890f8] /usr/sbin/mysqld(MDL_context::acquire_lock(MDL_request*, unsigned long)+0xae) [0x559d9bf895de] /usr/sbin/mysqld(MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, unsigned long)+0x16d) [0x559d9bf89f1d] /usr/sbin/mysqld(+0x1314ba3) [0x559d9c1beba3] /usr/sbin/mysqld(mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*)+0x4fce) [0x559d9c1d5a0e] /usr/sbin/mysqld(Sql_cmd_alter_table::execute(THD*)+0xeee) [0x559d9c05ae7e] /usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x1eef) [0x559d9c10ef5f] /usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x610) [0x559d9c1139c0] /usr/sbin/mysqld(Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long)+0xa9f) [0x559d9cebbdef] /usr/sbin/mysqld(Log_event::apply_event(Relay_log_info*)+0x78) [0x559d9ceb3ae8] /usr/sbin/mysqld(wsrep_apply_events(THD*, Relay_log_info*, void const*, unsigned long)+0x1de) [0x559d9c50178e] /usr/sbin/mysqld(+0x1405940) [0x559d9c2af940] /usr/sbin/mysqld(Wsrep_high_priority_service::apply_toi(wsrep::ws_meta const&, wsrep::const_buffer const&, wsrep::mutable_buffer&)+0x463) [0x559d9c2b2033] /usr/sbin/mysqld(wsrep::server_state::on_apply(wsrep::high_priority_service&, wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::const_buffer const&)+0x1e9) [0x559d9dc22089] /usr/sbin/mysqld(+0x2d8acbe) [0x559d9dc34cbe] /usr/lib/galera4/libgalera_smm.so(+0x5b51c) [0x7f80a56de51c] /usr/lib/galera4/libgalera_smm.so(+0x712d4) [0x7f80a56f42d4] /usr/lib/galera4/libgalera_smm.so(+0x7560e) [0x7f80a56f860e] /usr/lib/galera4/libgalera_smm.so(+0x9af21) [0x7f80a571df21] /usr/lib/galera4/libgalera_smm.so(+0x9be82) [0x7f80a571ee82] /usr/lib/galera4/libgalera_smm.so(+0x76f6b) [0x7f80a56f9f6b] /usr/lib/galera4/libgalera_smm.so(+0x4cea2) [0x7f80a56cfea2] /usr/sbin/mysqld(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*)+0x12) [0x559d9dc35342] /usr/sbin/mysqld(+0x144607d) [0x559d9c2f007d] /usr/sbin/mysqld(start_wsrep_THD+0x394) [0x559d9bf97214] /usr/sbin/mysqld(+0x2872e99) [0x559d9d71ce99] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f80b3760ac3] /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f80b37f2850] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f75bf8d3878): ALTER TABLE schema_b.table COLLATE utf8mb4_unicode_ci Connection ID (thread ID): 11 Status: NOT_KILLED You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash. Log of wsrep recovery (--wsrep-recover):

Is this a known issue? Does it help to upgrade?

The ALTERs were executed TOI. One occurred on the writer node, the other on a reader node. Each crash only occurred on 1 node.

Thank you,

Environment

None

Activity

Show:

Michaël de Groot September 6, 2024 at 12:22 PM

Unfortunately I am not with this customer anymore (I spent a week there on a consultancy engagement). I will ping them once more, to ask if they can provide the log file. Please close the issue in 14 days or so if no log file is uploaded.

Aaditya Dubey September 6, 2024 at 12:08 PM

Hi

We still haven't heard any news from you. So, I assume the issue no longer persists and will close the ticket. If you disagree, reply and create a follow-up with a new Jira report.

Aaditya Dubey August 22, 2024 at 11:48 AM

Hi

Thank you for the report.
Please share the complete error log below location:

To upload log files please use Percona SFTP server. Server: `sftp.percona.com` • Port: `2222` • Protocol: `sftp` • Username: PXC-4484 • Password: PXC-4484 • Upload via command line: `scp -P2222 ./PXC-4484.tar.gz PXC-4484@sftp.percona.com:PXC-4484.tar.gz` • NOTES: BLIND UPLOAD ONLY service, directory listing disabled, directory mirroring disabled. ADVICE: upload a tar file or similar with everything included there

This stack trace looks similar to https://perconadev.atlassian.net/browse/PXC-4389 However, we don’t have reproducible steps for this one, either.

Incomplete

Details

Assignee

Reporter

Needs QA

Yes

Components

Affects versions

Priority

Smart Checklist

Created August 19, 2024 at 3:43 PM
Updated September 6, 2024 at 12:22 PM
Resolved September 6, 2024 at 12:09 PM

Flag notifications