Replication breaks with HA_ERR_KEY_NOT_FOUND (INDEX_SCAN,HASH_SCAN)

General

Escalation

General

Escalation

Description

Tested with PS 8.0.37-29-1 and PS 8.4.0-1

Replication breaks with HA_ERR_KEY_NOT_FOUND when using slave_rows_search_algorithms='INDEX_SCAN,HASH_SCAN'

Steps to reproduce (taken from bug report: ) :

We check in the replica using the default slave_rows_search_algorithms :

With slave_rows_search_algorithms = 'TABLE_SCAN,INDEX_SCAN' on the replica replication doesn’t break. This workaround cannot be done with 8.4

With PS 8.0.37-29-1 there’s another issue: there’s data drift when using slave_rows_search_algorithms= ‘INDEX_SCAN,HASH_SCAN’ vs slave_rows_search_algorithms = ‘TABLE_SCAN,INDEX_SCAN’. This does not apply to PS 8.4.0-1 since there’s no slave_rows_search_algorithms variable.

We'll provide mysql-bin.064505 to reproduce via private channel

Now we will do it again, and you will see the data drift

We are classifying this ticket as CRITICAL since it creates silent data divergency on 8.4 replicas without a workaround available.

Environment

None

AFFECTED CS IDs

CS0049630

Activity

Show:

Varun Arakere Nagaraju January 2, 2025 at 9:37 AM

Conclusion:

HASH_SCAN is not designed to handle cases where multiple updates targeted at the same row during replication and has the following issues.

So, it is recommended to use slave_rows_search_algorithms='INDEX_SCAN,TABLE_SCAN' with an explicit primary key in the table as a workaround.

Varun Arakere Nagaraju January 1, 2025 at 4:35 PM

Submitted the findings and the patch as contribution to upstream via https://bugs.mysql.com/bug.php?id=115741

Varun Arakere Nagaraju December 26, 2024 at 10:25 AM

Thanks for the suggestion, . I’ve addressed your suggestions.

Done.
Modified the container to vector instead of a list. But, the number of items to be inserted into the vector is unknown beforehand because there can be any number of duplicates. So, I’ve updated the iterator to an index to access the vector.
Done.

Draft PR created.

Yura Sorokin December 19, 2024 at 4:23 PM

Looks absolutely great.

However, I have a few suggestions for further improvements (sorry ):

We can change std::set<uchar *, Key_compare> m_distinct_keys to std::unordered_set<uchar *, Key_hash_functor, Key_equality_functor> m_distinct_keys. As the only functionality we need from this container is finding if a key already exists, unordered version of the container will suite much better here. Although you will need to write your own Key_hash_functor and Key_equality_functor (similarly to Key_compare).
It would be great if we could change std::list<uchar *> m_distinct_keys_list; to std::vecttor. However, we should be more careful here, as inserting into this container may invalidate iterators (m_distinct_keys_list_itr). We can do this only when we can know the number of items to be inserted in advance. In this case we will be able to call reserve() method on this container and will guarantee that push_back()s will not cause memory reallocation and invalidating iterators. If this is not possible, I would suggest to change m_distinct_keys_list_itr to a simple index in this vector std::size_t m_distinct_key_idx
You can simplify the logic of inserting into m_distinct_keys: just simply do the call to insert() and check the boolean from the returned pair. You do not have to call find() first.