Skip to main content

Live Graphic and Textual Record|MySQL Technical Insights Series: Explore the Adaptive Hash Index Feature

KlustronAbout 6 min

Live Graphic and Textual Record|MySQL Technical Insights Series: Explore the Adaptive Hash Index Feature

Key Takeaway:

The Adaptive Hash Index (AHI) of MySQL's InnoDB storage engine boosts system performance significantly in optimal scenarios by caching the relationship between query conditions and data pages, allowing for swift data location that meets specific criteria.

The Adaptive Hash Index (AHI) is a pivotal feature of MySQL's InnoDB storage engine. With this feature, the system can achieve a substantial performance increase in certain scenarios. Ever since the introduction of this feature in MySQL version 5.5, it has been continuously refined and enhanced by the official team. In this talk, we'll delve deeply into this feature, providing a comprehensive understanding for everyone.

01 The Principle of AHI

Introduction to Adaptive Hash Index (AHI)

The Adaptive Hash Index is a feature developed by the InnoDB storage engine specifically to speed up point queries on large tables. Its core idea is to cache frequently queried key values and their corresponding information in the B-tree, allowing for direct data page location. This approach saves the overhead of locating from the root node, enhancing query performance.

Starting with MySQL version 5.5, the AHI feature was introduced and has been continuously improved in subsequent versions. Even in the latest 8.0.33 version, this feature is enabled by default.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/1.png)

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/2.png)

For the Klustron distributed database storage node version 1.2, it uses MySQL 8.0.26, and AHI can be enabled via configurations. Here is the overall architecture of the Klustron distributed database depicted in the figure below:

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/3.png)

Internal Mechanics of AHI

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/4.png)

First, let's examine the two B-tree structures within InnoDB as illustrated above: the primary key index on the left, and the secondary key index on the right. When we aim to locate a record with a primary key value of 5, we begin from the root node of the primary key's B-tree index, traverse all the way down to its leaves, reaching the second page at the bottom on the left and locating the record (5…). However, when we wish to search for a record with a secondary index key value of 35, we initially navigate the left secondary key's B-tree index, zeroing in on the secondary index record (35,5), and then using the corresponding primary key 5 to locate the complete data record (5…) in the primary key index on the left.

As demonstrated below, AHI establishes a hash index from the primary key 5 to the data page where the data resides. This allows for direct pinpointing to the data page where record (5…) is located, which is the second data page on the left. Hence, the entire process of searching from the root node to the leaf node becomes redundant.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/5.png)

The core data structure of AHI, as shown below, consists of multiple hash tables. It can be seen as the hash index of the B+ tree index.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/6.png)

AHI possesses the following characteristics:

• AHI is globally shared.

• The number of partitions (hash tables) is specified by the parameter innodb_adaptive_hash_index_parts.

• Partitions are associated via index id.

In detail, the construction and usage of AHI proceed as follows:

Building of AHI:

• Condition 1: The index has been accessed >17 times.

• Condition 2: A specific query condition on the index has been used >100 times (index->search_info).

• Condition 3: A particular leaf data page on the index is frequently queried (times exceed 1/16 of the page's record count).

• Other Conditions: No index has been constructed for this data page, or there's data variation, etc.

If all the above conditions are met, a hash index will be established for the records of that page.

Utilizing AHI:

• The index is selected and hit by the optimizer.

• A specific query condition on the index matches a common pattern (index->search_info).

• A hash value is generated based on the query condition, leading to the identification of the corresponding record's physical page.

02 Scenarios Where AHI Excels:

AHI's primary advantage lies in its capability to save the overhead of locating leaf nodes, especially when the B-tree has multiple levels. Thus, its most beneficial scenarios would be those dominated by high read operations and minimal write operations, and where locating leaf nodes takes precedence. Examples include:

• Frequent point queries based on equality.

• Secondary index back-to-table queries.

• Queries with IN conditions.

Here's an example:

CREATE TABLE `sbtest1` ( 

            `id` int(11) NOT NULL AUTO_INCREMENT, 

             `k` int(11) NOT NULL DEFAULT '0‘, 

             `c` char(120) NOT NULL DEFAULT ‘’, 

             `pad` char(60) NOT NULL DEFAULT ‘’, 

             PRIMARY KEY (`id`), 

            KEY `k_1` (`k`) ) ENGINE=InnoDB;

This table includes a primary key (id) and a secondary index column (k). Each value of k could correspond to multiple primary keys, meaning that k can have duplicate values. As a result, queries for k might require multiple back-to-table operations.

We'll define the redundancy of k as how many rows of primary table data each unique k corresponds to. The query SQL is: SELECT SUM(ASCII(c)) FROM sbtest1 WHERE k=?.

Here are the test results for this secondary index back-to-table query:

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/7.png)

As we observe, with a redundancy of 10 for K, AHI boosts QPS by an average of 24% and up to 47%. The performance enhancement is rather evident.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/8.png)

And, when K has a redundancy of 50, AHI elevates QPS by an average of 43%, peaking at 63%. This marks an even more significant performance increase.

Let's now explore another test scenario, specifically focusing on queries with the IN condition.

Query SQL: SELECT SUM(ASCII(c)) from sbtest1 where id in (…)

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/9.png)

With a list length of 50 for IN, AHI escalates QPS by an average of 74%, reaching up to 123%. Pretty impressive, isn't it?

Thus, AHI, when applied in its favorable scenarios, offers substantial enhancements in QPS. This is likely the reason why it's turned on by default in the official release.

03 Issues with AHI

While AHI offers significant performance boosts, it's not without its flaws. After diving into the advantages of AHI, let's also highlight some of its limitations, giving you a comprehensive view of the feature.

Issue 1: Jitter during Drop Table. Specifically:

• During a Drop table, AHI cleanup is required. This operation is time-consuming. As shown in the diagram below, with 100 million rows of data and running select.lua at 100 concurrent sessions for 600s, executing drop table takes around 66 seconds (compared to around 10 seconds without AHI).

• Dropping a large table that holds related locks for more than 600 seconds causes other threads to continuously wait until the signal times out after 600s, leading to an automatic crash of mysqld.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性\10.png)

Root of the Issue:

The key process of DROP TABLE cleaning the Buffer Pool when the Adaptive Hash Index (AHI) is enabled:

• Identify every index of the table to be deleted.

• For each index, identify every page recorded in the AHI (within the Buffer Pool).

• Scan each record within the page, resulting in the record set 'R'.

• Delete indices of records in 'R' from the AHI.

Thus, when a table being dropped has numerous records cached in the AHI, each record needs individual deletion, making the operation extremely time-intensive.

Ways to Circumvent:

1: Upgrade to version 8.0.23 or later (https://dev.mysql.com/worklog/task/?id=14100).

2: Disable AHI before deleting a large table.

It's worth noting that the storage node in Klustron database version 1.2 utilizes version 8.0.26, so this particular issue doesn't exist there.

Issue 2: AHI construction bottleneck.

Only a single thread can modify the hash table. Concurrent threads attempting to construct the AHI are held up, waiting for the X-lock on this hash table, effectively blocking the critical path of the query. As illustrated in the diagram below, there's a noticeable performance dip during the test, attributable to the AHI construction bottleneck.

![](MySQL内幕系列分享-深入了解Adaptive Hash Index特性/11.png)

Root of the Issue:

AHI Construction Process:

  1. Search the B-tree and pinpoint the leaf node, followed by an update of statistical information.

  2. Based on the statistical data, decide whether or not to add the current page's records to the AHI's hash table.

  3. When inserting records into the hash table, an X-lock needs to be acquired on the AHI partition corresponding to the index.

Thus, in high-concurrency situations, only a single thread can insert AHI records, with all other threads being blocked.

Ways to Circumvent:

1: Upgrade to version 8.0.30 or later (https://bugs.mysql.com/bug.php?id=100512).

2: Disable AHI.

04 Q&A

q1: Is it pointing to the data page, or can it pinpoint the specific offset within the page?

a1: Entries within AHI correspond to the query conditions of the leaf data page, and don't specify the exact offset. So, once positioned, an additional internal page search is required.

q2: What's the difference between AHI and Hash index?

a2: AHI is a hash index of the B-tree index, whereas the hash index is an index of the data.

q3: Is AHI non-persistent (i.e., doesn't write to disk)?

a3: Yes, AHI only resides in memory, occupying buffer pool memory space, but it doesn't use disk space.

q4: Does TRUNCATE table operation also slow down?

a4: Yes, both drop table and truncate table are affected by issue 1.

q5: In our actual work, how should we determine whether to enable AHI?

a5: It depends on the workload. If the workload is read-heavy with fewer writes and primarily involves equality or 'IN' queries, consider enabling AHI. If it's more write-intensive, it's recommended to disable AHI. Also, if an upgrade to version 8.0.26 isn't feasible, it's suggested to disable AHI before dropping frequently accessed large tables.

END