LP #952920: performance tuning for deadlock detect switch
General
Escalation
General
Escalation
Description
**Reported in Launchpad by Hui Liu last update 10-10-2016 09:17:09
As for deadlock detect mechanism in Innodb, it's talked for long whether we need recursive checking for deadlock for some specail scenario, such as lots of concurrent updates for the same record.
In the Planet MySQL, it's recommended: InnoDB is much faster when deadlock detection is disabled for workloads with a lot of concurrency and contention.
We are suffering the scenario above, in one of Taobao's core application, Item Center(IC). Most of the time, it's okay, while for some special sales promotion(about once per month), it's very very bad, as lots of users of Taobao participated in.
Here is the oprofile result(simulated the online scenario): 2 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 3 samples % symbol name 4 2008672 84.8036 lock_deadlock_recursive 5 91364 3.8573 lock_has_to_wait 6 11216 0.4735 safe_mutex_lock 7 9719 0.4103 ut_delay 8 8047 0.3397 MYSQLparse(void*) 9 7938 0.3351 lock_rec_has_to_wait_in_queue 10 7788 0.3288 code_state 11 7601 0.3209 my_strnncoll_binary 12 6703 0.2830 dict_col_get_clust_pos_noninline 13 6598 0.2786 db_enter 14 6451 0.2724 db_return 15 5733 0.2420 db_doprnt 16 5503 0.2323 rec_get_offsets_func 17 5325 0.2248 ha_innobase::update_row(unsigned char const*, unsigned char*) 18 5241 0.2213 mutex_spin_wait 19 4931 0.2082 build_template(row_prebuilt_struct*, THD*, st_table*, unsigned int) 20 4655 0.1965 lock_rec_convert_impl_to_expl
As you can see, it's soo bad for lock_detect_recursive function. So we added a switch to disable the deadlock detect dynamically. For IC application, there is almost no deadlock as business SQL logic is tuned, so there seems no risk yet.
To make the scenario repeatable, a test case is provided with the data we hit (tweak for sensitive columns) later and the related patch. Please help to have a review.
@Laurynas, ah, yes, the conflict is trivial, the purpose of mentioning it was for the question regarding innodb-purge-batch-size (and to imply that INNODB_MAX_PURGE_SIZE may not be required).
**Reported in Launchpad by Hui Liu last update 10-10-2016 09:17:09
As for deadlock detect mechanism in Innodb, it's talked for long whether
we need recursive checking for deadlock for some specail scenario, such as
lots of concurrent updates for the same record.
In the Planet MySQL, it's recommended:
InnoDB is much faster when deadlock detection is disabled for workloads with
a lot of concurrency and contention.
We are suffering the scenario above, in one of Taobao's core application, Item Center(IC).
Most of the time, it's okay, while for some special sales promotion(about once per month),
it's very very bad, as lots of users of Taobao participated in.
Here is the oprofile result(simulated the online scenario):
2 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
3 samples % symbol name
4 2008672 84.8036 lock_deadlock_recursive
5 91364 3.8573 lock_has_to_wait
6 11216 0.4735 safe_mutex_lock
7 9719 0.4103 ut_delay
8 8047 0.3397 MYSQLparse(void*)
9 7938 0.3351 lock_rec_has_to_wait_in_queue
10 7788 0.3288 code_state
11 7601 0.3209 my_strnncoll_binary
12 6703 0.2830 dict_col_get_clust_pos_noninline
13 6598 0.2786 db_enter
14 6451 0.2724 db_return
15 5733 0.2420 db_doprnt
16 5503 0.2323 rec_get_offsets_func
17 5325 0.2248 ha_innobase::update_row(unsigned char const*, unsigned char*)
18 5241 0.2213 mutex_spin_wait
19 4931 0.2082 build_template(row_prebuilt_struct*, THD*, st_table*, unsigned int)
20 4655 0.1965 lock_rec_convert_impl_to_expl
As you can see, it's soo bad for lock_detect_recursive function. So we added a switch
to disable the deadlock detect dynamically. For IC application, there is almost no
deadlock as business SQL logic is tuned, so there seems no risk yet.
To make the scenario repeatable, a test case is provided with the data we hit (tweak for sensitive columns) later and the related patch. Please help to have a review.