query returns incorrect result for table with no PK and non-unique CHAR utf8mb4 SK
Description
Environment
blocks
is triggered by
relates to
Smart Checklist
Activity
Yura Sorokin October 22, 2019 at 11:20 AM
In 8.0.17 merge in particular
Kamil Holubicki October 22, 2019 at 7:31 AM
Changes for 8.0 merged by .
Need backport changes to 5.7
Kamil Holubicki October 18, 2019 at 11:54 AMEdited
Hi ,
I have investigated why we do not observe the problem on 5.7:
Background:
We have table without primary key
ICP is used if both flags are set: HA_KEYREAD_ONLY and HA_DO_INDEX_COND_PUSHDOWN. Both are set in ha_rocksdb::index_flags()
HA_DO_INDEX_COND_PUSHDOWN blocks ICP directly in QEP_TAB::push_index_cond()
HA_KEYREAD_ONLY affects flags of part_of_key map in Item_field descriptor, which is used in make_cond_for_index() (sql_select.cc). If flag is not set (there is no index on key), this function is not able to create condition, and ICP is not set.
5.7:
open_binary_frm() (table.cc) detects that there is no primary key and sets share->primary_key to MAX_KEY value
ha_rocksdb::index_flags() knows that this is not PK, so does not set HA_KEYREAD_ONLY, but sets HA_DO_INDEX_COND_PUSHDOWN
when ICP is to be configured its configuration is cut-off on stage 4 (above)
8.0:
prepare_share() (dd_table_share.cc) does not detect that there is no primary key. share->primary_key has its default value (0)
ha_rocksdb::index_flags() sees that index 0 is PK, so sets HA_KEYREAD_ONLY
prepare_share() detects that there is no primary key and sets share->primary_key to MAX_KEY value. But HA_KEYREAD_ONLY is already set!
Field::optimize_range() calls ha_rocksdb::index_flags(). This time we know that index 0 is not PK, so we set HA_DO_INDEX_COND_PUSHDOWN.
As the result we have both flags set, so ICP can be configured.
I have added fix for above as commit 14b343947853c1d6eb1566f140a8e001869d8f64 to
https://github.com/kamil-holubicki/percona-server/tree/PS-5024-v2-8.0
In my opionion primary_key member should be set to KEY_MAX when TABLE_SHARE is created, because the assumption that 0 key is primary seems not good, but I'm not sure about this approach, as some other parts of code can rely on default 0 value (commented part of commit in table.h). If that's true, in prepare_share() would need to only handle PK setup if it exists.
Anyway, if not above fix in table.h I have done similar thing which is done on 5.7 in dd_table_share.cc
In my opinion we need 2 things that are present on the branch:
1. change in ha_rocksdb.cc (commit c100a1d31176fe8a6bc7602f1c82744029922255)
2. Above change (commit: 14b343947853c1d6eb1566f140a8e001869d8f64)
George Lorch October 15, 2019 at 12:22 PM
Thank you ! This actually makes sense. MyRocks does not support direct unpacking/memcmp of non-binary collation types. Your fix is probably the correct course of action here, do no allow ICP for non memcmp-able varchar indec collation types.
Kamil Holubicki October 15, 2019 at 11:57 AM
Here are my findings:
rdb_datadir.cc Rdb_field_packing::setup(): It sets pack and unpack functions for a given key. Pack is set always, but unpack for strings is set only if there is available decoder for given charset/collation. Right now there are available unpack functions for simple collations that use my_collation_8bit_simple_ci_handler MY_COLLATION_HANDLER (the ones configured in ctype-extra.cc CHARSET_INFO compiled_charsets[]). For other collations unpack function is set to NULL;
RocksDB uses pack function to create key in index.
ICP needs value unpacked from index key => here we need key unpack function. Then unpacked value is used to check index condition (ha_rocksdb.cc check_index_cond() => item_cmpfunc.cc Arg_comparator::compare_string()). If value is not decoded (null decode function), buffer of value to compare contains junk which causes problems.
For fields other than strings, there are pack/unpack pairs defined (rdb_datadir.cc Rdb_field_packing::setup();
For utf8mb4 pack function maps to Rdb_key_def::pack_with_make_sort_key() => my_strnxfrm_uca_900() from my_collation_uca_900_handler MY_COLLATION_HANDLER ctype-uca.cc)
I looked briefly on 5.7 to see what happens there, and it seems that "it works" because it is masked by some other bug. In simple, ICP is not used for strings, because sql_select.cc uses_index_fields_only() line 1604 returns false (sql_optimizer.cc line 6312, return item_field->field->part_of_key.is_set(keyno) returns false => flag is not set). I didn't investigate it deeper yet.
How to fix:
We can block usage of ICP if we detect that key cannot be unpacked. This seems to be simple fix in ha_rocksdb::index_flags() that prevents ICP to be used when we detect this. For fields that support unpacking, ICP will be used.
We can implement unpack functions for more complicated collations. These would be functions complementary to my_strnxfrm_uca_900, but the effort here is much more higher.
I have implemented fix 1 proposal (probably still needs more testing/analysis/investigation)
https://github.com/kamil-holubicki/percona-server/tree/PS-5024-8.0
Details
Assignee
Kamil HolubickiKamil HolubickiReporter
George LorchGeorge Lorch(Deactivated)Labels
Time tracking
1w 1d 5h 7m loggedComponents
Fix versions
Affects versions
Priority
High
Details
Details
Assignee
Reporter
Labels
Time tracking
Components
Fix versions
Affects versions
Priority
Smart Checklist
Open Smart Checklist
Smart Checklist
Open Smart Checklist
Smart Checklist
