Incomplete
Details
Details
Assignee
Lalit Choudhary
Lalit ChoudharyReporter
lpjirasync
lpjirasync(Deactivated)Priority
Smart Checklist
Smart Checklist
Created January 24, 2018 at 8:31 AM
Updated March 6, 2024 at 2:00 PM
Resolved March 29, 2020 at 4:15 PM
**Reported in Launchpad by Rick Pizzi last update 25-07-2017 09:15:27
Yesterday our master got a severe stall which ended in InnoDB committing suicide with an assertion:
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2015-09-26 20:54:46 7efda0ca3700 InnoDB: Assertion failure in thread 139627789432576 in file srv0srv.cc line 2128
The InnoDB engine status dump (attached to this bug) shows entire INNODB subsystem locked , with main thread as the blocker, which seem stuck in state "enforcing dict cache limit"
Looking at the error log, what puzzles me is the following (139627810412288 is the main thread):
--Thread 139627810412288 has waited at srv0srv.cc line 2596 for 241.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x134c460 '&dict_operation_lock'
a writer (thread id 139627810412288) has reserved it in mode wait exclusive
number of readers 7, waiters flag 1, lock_word: fffffffffffffff9
Last time read locked in file row0ins.cc line 1803
Last time write locked in file /mnt/workspace/percona-server-5.6-redhat-binary/label_exp/centos6-64/rpmbuild/BUILD/percona-server-5.6.25-73.1/storage/innobase/dict/dict0stats.cc line 2385
From the above it seems that the blocker was blocked by itself. This seems quite odd!!
The thread already owns &dict_operation_lock but wants to get it twice. That's a deadlock.
Running Percona Server 5.6.25-73.1-log . my.cnf and error log attached.
Additional info which may help troubleshooting:
At the time the problem occurred there was an online schema change running and about to complete.
The table being altered was a very large (1 billion rows) partitioned one, it is likely that the OSC tool was about to execute or executing the atomical rename of tables, though this is just a speculation as the operation never completed, and therefore is not logged in binary logs or anywhere else.
Thanks
Rick