InnoDB Transaction Locking and Scheduling Algorithm

Key Takeaway:

The transaction system of the MySQL InnoDB storage engine significantly boosts the overall performance of the system by ensuring transaction integrity and consistency through its transaction locking system, and by employing the CATS transaction scheduling algorithm.

The transaction locking system is a crucial component of the MySQL InnoDB storage engine. This system ensures the integrity and consistency of transactions. Given that InnoDB uses a row-level locking system, it offers a very granular level of data locking. This granularity significantly improves the system's performance in high-concurrency scenarios. Moreover, in MySQL version 8.0, InnoDB adopted a hotspot-aware transaction scheduling algorithm called CATS. This enhances the system's performance considerably, especially in situations where certain row locks become hot resources contended by multiple transactions. In this discussion, we will delve deeper into the transaction locking system and transaction scheduling algorithm of the InnoDB storage engine, offering a thorough understanding of both.

01 Introduction to InnoDB Transaction Locking

1.1 Overview of the Transaction Locking System

Locking mechanisms are used to manage concurrent access to shared resources. By locking these shared resources, it ensures that they are not accessed and modified by multiple users simultaneously, preventing data inconsistencies.

On the other hand, transaction locks coordinate concurrent access by multiple transactions to shared data, thus guaranteeing transaction consistency and integrity. There are various types of transaction locks.

The illustration below classifies different types of transaction locks. For example, based on the lock mechanism, transaction locks can be categorized as optimistic locks and pessimistic locks. Based on compatibility, they can be divided into shared locks and exclusive locks. By granularity, they can be sorted into table locks, page locks, and row locks. Based on lock mode, they can be categorized into record locks, gap locks, next-key locks, intention locks, and insert intention locks. In this discussion, we will focus on row-level locking in InnoDB.

[Image: Classification of different types of transaction locks]

In the Klustron distributed database storage node version 1.2, MySQL version 8.0.26 is used, which employs the InnoDB storage engine. Therefore, Klustron's storage nodes have a transaction locking system and transaction scheduling algorithm consistent with MySQL. Below is the overall architecture of the Klustron distributed database:

[Image: Overall architecture of the Klustron distributed database]

1.2 Introduction to InnoDB's Transaction Processing System

As illustrated below, InnoDB's transaction processing system is primarily built on three subsystems:

Transaction Subsystem: This contains the overall information of all transactions and the data structure of each individual transaction, such as transaction ID, transaction lock list, readview information, etc.
Transaction Lock Subsystem: This encompasses all relevant information about locks, the comprehensive details of all locks in the system, and the data structure of each lock object, such as the row lock object and the hash table of all row lock objects.
Readview Subsystem: This contains information about the readview that the current transaction can access, and it's mainly used to determine the visibility of other transactions to the current transaction.

These three subsystems are interconnected through pointers in their respective memory structures. Together, they form an integrated whole throughout the transaction processing procedure, underpinning the primary flow of transaction handling.

[Image: InnoDB's transaction processing system architecture]

02 In-Depth Explanation of InnoDB Row Locks

InnoDB's row lock is the most crucial type of lock within the transaction locking system. It determines a specific row lock corresponding to a piece of data through the Table ID (space id) + Page Number (page no) and Heap Number (heap no). One noteworthy aspect is that, to reduce the number of row lock objects held by a single transaction, InnoDB only creates one row lock object for data rows on the same data page held by the same transaction. It then marks the data row lock status by setting the bits in its bitmap.

As illustrated below, the row lock object lock_rec_t encompasses all relevant information about the row lock. The relationship between the row lock and transactions is also depicted.

[Image: Row lock object structure and its relationship with transactions]

2.1 Row Lock Acquiring Process

If there are no record locks on the page where the record resides, a lock object is directly created and added to rec_hash.
If there exists only one record lock on the page and it belongs to the current transaction, the corresponding bitmap's bit is directly set.
The system checks if there are locks conflicting with the current requested lock mode. If so, a lock object is created, added to the waiting queue, and a deadlock check is performed.

2.2 Row Lock Release Process

Row locks are typically released upon transaction commitment. Simultaneously, the system checks if other waiting sessions can be awakened. This is done by traversing all the waiting locks on the same page and determining whether these waiting locks can be awakened.

2.3 Row Lock Deadlock Detection

Deadlock detection is a preemptive checking mechanism designed to avoid situations where multiple transactions cannot continue execution due to the formation of cyclic locks.

The deadlock detection employs a depth-first traversal. It makes judgments based on constructing a transaction waiting graph. If an end-to-end lock request waiting loop is identified, it can be determined that a deadlock has occurred. If the detection depth is too long (i.e., the checking chain formed by lock-waiting sessions is very long), it is also treated as a deadlock. The default maximum depth is 200.

2.4 Issues with Deadlock Detection

Deadlock detection is conducted by holding a global overarching lock, which is very costly. In specific scenarios, such as hotspot updates, it's feasible to consider deactivating it and accelerating performance by setting the innodb_lock_wait_timeout.

MySQL 8.0 has optimized the row lock system and deadlock detection (lock partitioning, asynchronous deadlock detection, etc.)

The following chart represents our tests in hotspot update scenarios (high concurrency single-row updates). As concurrency increases, performance declines sharply.

[Image: Performance chart showing decline as concurrency increases in hotspot update scenarios]

Meanwhile, our Klustron will employ row lock-based optimizations to address the hotspot row update issues.

03 InnoDB's Transaction Scheduling Algorithm

The problem that the transaction scheduling algorithm addresses is which transaction should acquire the lock first when multiple transactions are waiting for a lock on the same object. There are two such algorithms: FCFS and CATS. Before MySQL 8.0, InnoDB consistently employed the FCFS algorithm. After that, it started adopting the CATS algorithm.

FCFS (First Come First Served): Scheduling is done in the order transactions request the row lock.

CATS (Contention-Aware Transaction Scheduling): Scheduling is determined based on the number of other transactions a given transaction blocks.

As illustrated below, since transaction t1 blocks 4 transactions and t2 blocks 3, t1 is given priority.

[Image: Illustration of CATS algorithm prioritizing transaction t1 over t2]

From the above explanation, it's evident that the CATS algorithm enhances the overall transaction processing capability when there are data access hotspots, i.e., when multiple transactions are vying for the same lock resource. A vivid analogy would be a traffic jam scenario where we can prioritize vehicles with more passengers, like buses, to enhance overall road throughput.

The detailed implementation of CATS is mainly handled at two points:

When adding each row lock, update all related transactions' edges (waiting relationships).
After the lock release during transaction scheduling, transactions are sorted based on their edge values, with those having higher edge values being prioritized. Also, all related transactions' edges are updated.

The following figure shows the optimized effect:

[Image: Chart showing improved TPS and reduced latency after adopting CATS algorithm]

As can be seen, the TPS (Transactions Per Second) exhibits a clear improvement after adopting the CATS algorithm, while the transaction latency or execution time significantly decreases.

04 Q&A

q1: Can you briefly explain the optimization Klustron plans to make for the hotspot row update issue?

a1: Regarding the issue of hotspot row updates, as the main problem is the prolonged time taken for deadlock detection due to an excessive number of row locks in high-concurrency scenarios, our proposed solution is to avoid having too many row locks on the same row of data. In other words, for hotspot rows, we can reduce the creation of a large number of row locks by using a pre-queueing approach. To put it more vividly, we let all the hotspot update transactions queue up first, then execute them one by one sequentially, instead of everyone rushing in, which would decrease efficiency.

q2: What optimizations does MySQL 8.0 offer for deadlock detection?

a2: MySQL 8.0 has been continuously optimizing its lock system, with two significant enhancements. One is the lock partitioning optimization, where row locks are partitioned to prevent interference between different lock partitions. The other is asynchronous deadlock detection optimization. This optimization primarily delegates the deadlock detection task to a background thread, allowing the working thread to move on to other tasks rather than waiting for deadlock detection to complete, thereby improving its efficiency.

InnoDB Transaction Locking and Scheduling Algorithm

# InnoDB Transaction Locking and Scheduling Algorithm

# 01 Introduction to InnoDB Transaction Locking

# 1.1 Overview of the Transaction Locking System

# 1.2 Introduction to InnoDB's Transaction Processing System

# 02 In-Depth Explanation of InnoDB Row Locks

# 2.1 Row Lock Acquiring Process

# 2.2 Row Lock Release Process

# 2.3 Row Lock Deadlock Detection

# 2.4 Issues with Deadlock Detection

# 03 InnoDB's Transaction Scheduling Algorithm

# 04 Q&A

# END

InnoDB Transaction Locking and Scheduling Algorithm

01 Introduction to InnoDB Transaction Locking

1.1 Overview of the Transaction Locking System

1.2 Introduction to InnoDB's Transaction Processing System

02 In-Depth Explanation of InnoDB Row Locks

2.1 Row Lock Acquiring Process

2.2 Row Lock Release Process

2.3 Row Lock Deadlock Detection

2.4 Issues with Deadlock Detection

03 InnoDB's Transaction Scheduling Algorithm

04 Q&A

END