TokuDB crashed under small tokudb_max_lock_memory configuration

Description

crash stack as below:

#0 0x00007fed3ffaa334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fed3ffa55d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fed3ffa54a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000c096f0 in toku_mutex_trylock_with_source_location (src_file=0xf4b370 "percona-server/storage/tokudb/PerconaFT/locktree/manager.cc", src_line=461, mutex=0x7fe950adc3e0) at percona-server/storage/tokudb/PerconaFT/portability/toku_pthread.h:273
#4 toku::locktree_manager::get_status (this=0x7fed3e41d4a0, statp=statp@entry=0x7fe937b9b050) at percona-server/storage/tokudb/PerconaFT/locktree/manager.cc:461
#5 0x0000000000bfa67c in env_get_engine_status (env=0x7fed3e047300, engstat=0x7fe937b9cc78, maxrows=320, num_rows=0x7fe937ba20c0, redzone_state=0x7fe937ba20ac, env_panicp=<optimized out>, env_panic_string_buf=0x7fe937ba1c88 "percona-server/storage/tokudb/PerconaFT/locktree/txnid_set.cc:98 get: Assertion `r == 0' failed (errno=0) (r=22)\n", env_panic_string_length=1024, include_flags=TOKU_ENGINE_STATUS) at percona-server/storage/tokudb/PerconaFT/src/ydb.cc:2246
#6 0x0000000000bf9e2a in env_get_engine_status_text (env=0x7fed3e047300, buff=0x7fe937ba2130 "BUILD_ID = 0\n", bufsiz=40960) at percona-server/storage/tokudb/PerconaFT/src/ydb.cc:2388
#7 0x0000000000ceebd8 in db_env_do_backtrace (outf=0x7fed3efb2880 <IO_2_1_stderr>) at percona-server/storage/tokudb/PerconaFT/portability/toku_assert.cc:126
#8 0x0000000000ceec93 in toku_do_backtrace_abort () at percona-server/storage/tokudb/PerconaFT/portability/toku_assert.cc:146
#9 0x0000000000ceedae in toku_do_assert_zero_fail (expr=expr@entry=22, expr_as_string=expr_as_string@entry=0xf4095f "r", function=function@entry=0xf68a00 <toku::txnid_set::get(unsigned long) const::_FUNCTION_> "get", file=file@entry=0xf68978 "percona-server/storage/tokudb/PerconaFT/locktree/txnid_set.cc", line=line@entry=98, caller_errno=<optimized out>) at percona-server/storage/tokudb/PerconaFT/portability/toku_assert.cc:177
#10 0x0000000000cb0bdf in toku::txnid_set::get (this=this@entry=0x7fe937bac5b0, i=i@entry=0) at percona-server/storage/tokudb/PerconaFT/locktree/txnid_set.cc:98
#11 0x0000000000cad70a in toku::lock_request::retry (this=0x7fe937bac760) at percona-server/storage/tokudb/PerconaFT/locktree/lock_request.cc:315
#12 0x0000000000cada34 in toku::lock_request::wait (this=this@entry=0x7fe937bac760, wait_time_ms=<optimized out>, killed_time_ms=4000, killed_callback=0xbf1ec0 <tokudb_killed_callback()>) at percona-server/storage/tokudb/PerconaFT/locktree/lock_request.cc:213
#13 0x0000000000ca4a19 in toku_db_wait_range_lock (db=0x7fe950adc000, txn=0x7fe950ac9100, request=request@entry=0x7fe937bac760) at percona-server/storage/tokudb/PerconaFT/src/ydb_row_lock.cc:237
#14 0x0000000000c9f7b2 in toku_c_getf_set (c=0x7fe937baca00, flag=<optimized out>, key=0x7fe937bacd30, f=<optimized out>, extra=<optimized out>) at percona-server/storage/tokudb/PerconaFT/src/ydb_cursor.cc:489
#15 0x0000000000ca0390 in toku_c_get (c=c@entry=0x7fe937baca00, key=key@entry=0x7fe937bacd30, val=val@entry=0x7fe937bacd50, flag=flag@entry=26) at percona-server/storage/tokudb/PerconaFT/src/ydb_cursor.cc:778
#16 0x0000000000ca83e4 in toku_db_get (db=<optimized out>, txn=<optimized out>, key=0x7fe937bacd30, data=0x7fe937bacd50, flags=<optimized out>) at percona-server/storage/tokudb/PerconaFT/src/ydb_db.cc:270
#17 0x0000000000ca91ad in autotxn_db_get (db=0x7fe950adc000, txn=0x7fe950ac9100, key=0x7fe937bacd30, data=0x7fe937bacd50, flags=<optimized out>) at percona-server/storage/tokudb/PerconaFT/src/ydb_db.cc:833
#18 0x0000000000bc95db in ha_tokudb::init_auto_increment (this=this@entry=0x7fe950a7a810) at percona-server/storage/tokudb/ha_tokudb.cc:7936
#19 0x0000000000bf13df in ha_tokudb::initialize_share (this=this@entry=0x7fe950a7a810, name=name@entry=0x7fe950a70360 "./xiangluo/sbtest16", mode=mode@entry=2) at percona-server/storage/tokudb/ha_tokudb.cc:1763
#20 0x0000000000bf19eb in ha_tokudb::open (this=0x7fe950a7a810, name=0x7fe950a70360 "./xiangluo/sbtest16", mode=2, test_if_locked=<optimized out>) at percona-server/storage/tokudb/ha_tokudb.cc:1895
#21 0x00000000005ec713 in handler::ha_open (this=0x7fe950a7a810, table_arg=table_arg@entry=0x7fe950a89000, name=0x7fe950a70360 "./xiangluo/sbtest16", mode=mode@entry=2, test_if_locked=18) at percona-server/sql/handler.cc:2748
#22 0x00000000007cc944 in open_table_from_share (thd=thd@entry=0x7fe967a89800, share=share@entry=0x7fe950a70010, alias=<optimized out>, db_stat=db_stat@entry=39, prgflag=prgflag@entry=44, ha_open_flags=<optimized out>, outparam=<optimized out>, outparam@entry=0x7fe950a89000, is_create_table=<optimized out>, is_create_table@entry=false) at percona-server/sql/table.cc:2455
#23 0x00000000006ea6ea in open_table (thd=thd@entry=0x7fe967a89800, table_list=table_list@entry=0x7fe950a4d240, ot_ctx=ot_ctx@entry=0x7fe937bad3b0) at percona-server/sql/sql_base.cc:3226
#24 0x00000000006f1e3c in open_and_process_table (ot_ctx=0x7fe937bad3b0, has_prelocking_list=false, prelocking_strategy=0x7fe937bad5f0, flags=0, counter=0x7fe967a8ba58, tables=0x7fe950a4d240, lex=0x7fe967a8b998, thd=0x7fe967a89800) at percona-server/sql/sql_base.cc:4810
#25 open_tables (thd=thd@entry=0x7fe967a89800, start=start@entry=0x7fe937bad5e8, counter=0x7fe967a8ba58, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7fe937bad5f0) at percona-server/sql/sql_base.cc:5317
#26 0x00000000006f256a in open_normal_and_derived_tables (thd=thd@entry=0x7fe967a89800, tables=0x7fe950a4d240, flags=flags@entry=0) at percona-server/sql/sql_base.cc:6029
#27 0x00000000005ab7ea in execute_sqlcom_select (thd=thd@entry=0x7fe967a89800, all_tables=<optimized out>) at percona-server/sql/sql_parse.cc:5750
#28 0x0000000000742a4d in mysql_execute_command (thd=thd@entry=0x7fe967a89800) at percona-server/sql/sql_parse.cc:3057
#29 0x0000000000747b28 in mysql_parse (thd=thd@entry=0x7fe967a89800, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fe937bae540) at percona-server/sql/sql_parse.cc:7080
#30 0x000000000074914d in dispatch_command (command=<optimized out>, thd=0x7fe967a89800, packet=0x7fe967aff501 "SELECT c FROM sbtest16 WHERE id=51", packet_length=<optimized out>) at percona-server/sql/sql_parse.cc:1487
#31 0x000000000074b061 in do_command (thd=<optimized out>) at percona-server/sql/sql_parse.cc:1062
#32 0x000000000070eb7d in do_handle_one_connection (thd_arg=thd_arg@entry=0x7fe967a89800) at percona-server/sql/sql_connect.cc:1590
#33 0x000000000070ebc9 in handle_one_connection (arg=0x7fe967a89800) at percona-server/sql/sql_connect.cc:1494
#34 0x00007fed3ffa3aa1 in start_thread () from /lib64/libpthread.so.0
#35 0x00007fed3ed0c93d in clone () from /lib64/libc.so.6

  1.  

    1. reproduction

1. Set `tokudb_max_lock_memory=8192` my.cnf(you can try with a smaller value)

2. using sysbench oltp.lua to prepare, then run , the paramaters I used:
--oltp_tables_count=64 --oltp_table_size=100 --num-threads=512 --mysql-table-engine=tokudb --mysql-ignore-errors=1206,1205,1213

  1.  

    1. fix

The reason is TokuDB locktree memory is exhausted and this case is not handle properly in `lock_request::retry()`.

```
if (r == 0) {
remove_from_lock_requests();
complete(r);
if (m_retry_test_callback)
m_retry_test_callback(); // test callback
toku_cond_broadcast(&m_wait_cond);
} else {
m_conflicting_txnid = conflicts.get(0);
}
```

we should handle `r == TOKUDB_OUT_OF_LOCKS` case.

Environment

None

Activity

Julia Vural 
March 4, 2025 at 9:10 PM

It appears that this issue is no longer being worked on, so we are closing it for housekeeping purposes. If you believe the issue still exists, please open a new ticket after confirming it's present in the latest release.

Rich Prohaska 
April 9, 2020 at 3:59 PM

FIxed in https://github.com/percona/PerconaFT/pull/441, which should be a simpler merge than pull 436

Rich Prohaska 
April 8, 2020 at 8:03 PM

Fungo Wang 
April 18, 2018 at 9:55 AM

hi George,

this bug is actually spotted on our prod env, in which max_lock_memory(8M) is 12.5% of the tokudb cache size(64M).

In the bug report description, the suggested tokudb_max_lock_memory=8192 value is intended for easily case reproducing.

So, for some high concurrent load and relative small tokdub_cache_size, this bug is not that hard to be encountered.

George Lorch 
April 17, 2018 at 8:17 PM
(edited)

If truly related to very low max_lock_memory then going to on hold as chances are it will never be properly fixed as-is. What needs to happen is more graceful handling of hitting max_lock_memory to return an error and properly rollback, but this has has many tentacles and the very simple fix is to not use such a low value.

The short term fix is to disallow small values for max_lock_memory and force it to some minimal threshold. The default value is 12.5% of the tokudb cache size.

Won't Do

Details

Assignee

Reporter

Components

Affects versions

Priority

Created March 21, 2018 at 9:23 AM
Updated March 4, 2025 at 9:10 PM
Resolved March 4, 2025 at 9:10 PM