crash on snapshot size check in RocksDB
Description
Environment
is blocked by
relates to
Smart Checklist
Activity

George Lorch November 20, 2018 at 9:08 PM
Due to and the reversion, this is still an issue until we perform another upstream merge.

George Lorch September 13, 2018 at 5:23 PMEdited
PS 8.0.12
rocksdb.autoinc_debug
rocksdb.drop_index_inplace
rocksdb.nonflushing_analyze_debug
rocksdb.nonflushing_analyze_parts_debug

George Lorch August 1, 2018 at 7:13 PM
It seems this commit in rocksdb is the fix for this https://github.com/facebook/rocksdb/commit/bb2a2ec7313e9af648fc9ac613289e18ed019eb0

George Lorch August 1, 2018 at 5:22 PMEdited
So the first difference is that in 8.0, TC_BINLOG is now being used by default since binlog is enabled by default in 8.0. In 5.7 it is not and thus TC_MMAP is being used. What this means is that the commit process in 8.0 is using the Group Commit 2PC process, i.e. calling prepare, then commit whereas in 5.7 there is no prepare phase. This shows a difference in the path taken within ./storage/rocksdb/rocksdb/utilities/transactions/pessimistic_trancaction.cc within PessimisticTransaction::Rollback()
Enabling binlog in 5.7 reproduces the issue with the same stack trace.
This does not happen on upstream Facebook MySQL 5.6. It appears that the binlog group commit algorithm is not coming into play and is never calling 'prepare'.
This DOES happen on upstream fb-mysql at the fb-prod201801 tag that we are merged up to and utilizing the same rocksdb submodule commit pointer that we have as suggested by Facebook. There have been changes to the 2pc transaction model since that commit. From here I will do two different things, 1) try rebuilding PS with an updated rocksdb submodule that passes through these changes; 2) report this to the upstream team for confirmation/clarification of the issue.
The following crash takes place on RocksDB debug build 8.0.11:
The crash can be repeated with the following mtr test: