Skip to main content

KlustronDB Global Deadlock Detection Technology

KlustronDBAbout 10 min

KlustronDB Global Deadlock Detection Technology

Transaction deadlock is a common phenomenon in database systems based on transaction locks, and there are corresponding deadlock handling mechanisms in single-node databases such as MySQL and Oracle. In distributed database systems like KlustronDB, global deadlocks may occur, and the deadlock handling mechanisms of storage nodes cannot resolve them. A global deadlock handling mechanism at the distributed database cluster level is required to detect and resolve them. This article mainly introduces the mechanism of global deadlock occurrence and the KlustronDB cluster's global deadlock detection and handling mechanism.

This article first introduces why transactions executed in parallel in a single-node database can cause deadlocks, as well as the methods for detecting and resolving them. It then introduces global deadlocks in distributed cluster scenarios, and the global deadlock handling mechanism of the KlustronDB cluster.

Basic Knowledge, Mechanism, and Hazards of Database Transaction Deadlocks

Database systems usually use transaction locks for concurrency control to prevent two operations that have logically conflicting relationships from interfering with each other. For example, if a user in connection conn1 wants to drop table t1; and another user in connection conn2 initiates an insert into t1 to insert data into t1, then these two operations are mutually conflicting; or if conn1 wants to modify a row R1, and conn2 also wants to modify R1, then these two operations are also conflicting. To avoid these conflicts, the database system locks the table, page, and/or row where the data resides when reading or writing data. After obtaining the lock, it executes the operation, and then releases the transaction locks when the transaction commits. Some database system storage engines, such as InnoDB, use Multi-Versioned Concurrency Control (MVCC) for concurrency control of read operations and use transaction locks for concurrency control of write operations. MVCC does not acquire any transaction locks, so if MVCC is used for concurrency control of read operations, read operations are not blocked by write operations and deadlocks do not occur, allowing the system to achieve higher concurrency performance.

As can be seen from the example above, transaction locks are divided into multiple levels, including table-level, page-level, and/or row-level. To execute any DDL or DML statement, a table-level transaction lock must first be obtained. Some systems support page-level transaction locks, some support row-level transaction locks, and some support both. The smaller the granularity of the transaction lock, the higher the concurrency. Table-level locks obtained by DML are usually intention locks, so table-level locks obtained by multiple DML operations do not conflict and therefore do not affect concurrency performance; table-level locks obtained by DDL are usually non-intention locks, so a DDL statement will conflict with other DDL or DML statements on the same table.

The conflict matrix defines the conflict relationships of transaction locks on the same database object in the database. The transaction lock system determines whether two transaction locks conflict based on the conflict matrix. The lock types and conflict definitions may vary slightly in different database systems, but the lock types and conflict relationships shown in the diagram below are common to the conflict matrices of all transactional database systems.

T.X and T.S are table-level exclusive and shared locks. Usually, executing a DDL statement on a table requires first obtaining its T.X.

T.IX: Table-level intention write lock, T.IS: Table-level intention read lock: Table-level intention locks are usually acquired by DML statements and can prevent the table from being truncated/dropped/renamed, etc., when performing insert, delete, update, or select operations.

R.X: Row-level write lock. To modify, insert, or delete a row, you must first obtain a row-level write lock. If read operations use MVCC for concurrency control, a row-level shared lock is not needed; otherwise, row-level and table-level shared locks R.S and T.S are required.

In the matrix, X indicates that two lock requests on the same table or the same row of a table are conflicting, and lock requests from multiple transactions cannot be granted simultaneously; √ indicates that the two lock requests do not conflict and are compatible, allowing multiple transactions to obtain the transaction locks they requested at the same time.

conflict-matrix

Conflict Matrix Example

Deadlocks in a database occur naturally. The occurrence of a deadlock is neither an error nor a bug in the database system or application system. The database system must be able to automatically detect and resolve deadlocks, then return an error to the client. The client needs to handle the error returned when executing the statement. The usual approach is to roll back the transaction, but for some applications, a failed DML operation can be retried. Therefore, the application software can also decide to re-execute the same DML operation after receiving a deadlock error, or execute it again after making appropriate adjustments, or even directly ignore the failed statement (this method is relatively rare).

There are the following three conditions for a database deadlock to occur:

  1. If a new lock cannot be obtained, the transaction will be blocked and cannot continue. Otherwise, it may read completely meaningless and unreasonable data that is being modified, which is also called 'dirty data'.

  2. Locks that have been acquired are not released until the transaction ends. This is also called Two-Phase Locking (2PL), which is an effective method to reduce deadlocks and is also a requirement of the ACID properties of database system transactions.

  3. Forming a circular wait: Two transactions, trx1 and trx2, each acquire the transaction lock needed by the other, which creates a circular wait condition. The figure below is an example of a circular wait condition.

cycular-wait

Example of circular wait in deadlock

The Dangers of Database Transaction Deadlocks and How to Resolve Them

If a deadlock cannot be resolved quickly, it will cause query timeouts, statement failures, a decrease in system throughput, an increasing number of locked objects and blocked transactions, and the database system will gradually become unable to continue operating.

Currently, common transactional storage engines support transaction deadlock detection and resolution. The method is to perform deadlock detection either when a transaction lock cannot be acquired or periodically by a deadlock handler: scanning the lock system's wait relationships, finding lock wait cycles, and then choosing one transaction in the cycle as the so-called victim to reject its lock request. In this way, the DML statement being executed by this victim transaction will return an error, and the client needs to handle this error, for example, by rolling back the transaction or re-executing the statement.

The InnoDB storage engine in MySQL has its own deadlock detection mechanism, so deadlock detection and resolution should be done according to this method.

The Formation Mechanism of Global Deadlocks in KlustronDB Distributed Database Clusters

The storage node KlustronDB_storage of KlustronDB currently supports two storage engines: InnoDB and RocksDB. Both use row-level transaction locks for concurrency control in insert, update, and delete operations. Therefore, transaction branches will acquire row locks (for InnoDB, there may also be special forms of transaction locks such as GAP locks, but for global deadlock handling, no distinction is required). This can lead to lock-wait relationships between transaction branches, thereby forming global deadlocks.

Global deadlock is a new type of database transaction deadlock that occurs at the global level of a distributed database cluster and cannot be detected or resolved by the deadlock handler of a single storage node. However, its mechanism and conditions of occurrence are exactly the same as in the case of a single-node database mentioned above.

The figure below is an example of a global deadlock. Transaction branch GT1.T11 of GT1 on the primary node SM1 of shard1 updates R1, and transaction branch GT2.T22 of GT2 on the primary node SM2 of shard2 updates R2. Then, transaction branch GT1.T12 on SM2 attempts to update R2, and when trying to acquire a row-level write lock on R2, it is blocked by GT2.T22. This causes GT1 to wait for GT2, denoted as GT1 -> GT2;

At the same time, the transaction branch GT2.T21 of GT2 on SM1 wants to update R1, and it is blocked by GT1.T11 when attempting to acquire a row-level write lock on R1. This causes GT2 to wait for GT1, denoted as GT2 -> GT1;

This creates a circular wait condition, which is a global deadlock. From the perspective of SM1 and SM2, no deadlock has occurred because, from SM1's view, there is only a lock wait relationship T21->T11, and from SM2's view, there is only a lock wait relationship T12->T21. Therefore, neither SM1 nor SM2 can detect or resolve this global deadlock. At this point, the global deadlock handler of KlustronDB is required for global deadlock detection and resolution.

gtxn-cycle

Global deadlocks have the following characteristics:

  1. The local waiting of transaction branches on storage nodes causes global transactions to wait, and the global transactions form a circular wait relationship.

  2. No deadlock occurs within each storage node (even if it does occur, it will be automatically resolved by the node itself, which is not discussed in this section), and storage nodes cannot automatically detect or resolve global deadlocks.

  3. Global transactions in a global deadlock cycle can be transactions initiated from multiple computing nodes, or they can all come from the same computing node.

Global deadlock in the Community Edition Percona-MySQL-server

For the community edition of Percona-MySQL-server, although it supports both InnoDB and RocksDB storage engines, it does not provide deadlock detection and resolution mechanisms at the MySQL server layer. This can lead to deadlocks similar to those described above occurring within a percona-server instance: suppose there are tables t1 and t2, with t1 using the InnoDB engine and t2 using the RocksDB engine. Transaction T1 updates a row R1 in table t1, while transaction T2 updates a row R2 in table t2. Then T1 attempts to update t2.R2 and T2 attempts to update t1.R1. At this point, T1 and T2 experience a deadlock at the MySQL server layer.

This deadlock cannot be detected by either InnoDB or RocksDB, because they can only see the lock tables and locked objects within their own engine. That is, InnoDB sees T1 waiting for T2, and RocksDB sees T2 waiting for T1, but neither sees the circular wait between T1 and T2, so they cannot release the lock and can only wait for the lock timeout to complete the unlock. However, lock timeouts usually take several seconds to ensure that a normally running database system with a heavy load does not frequently fail statements due to lock timeouts. This can lead to a significant decrease in data update performance at the MySQL server layer due to deadlocks.

KlustronDB's global deadlock detection mechanism can detect and resolve cross-engine deadlocks within a single storage node, thereby significantly improving system performance.

Unlocking method for global deadlocks in KlustronDB distributed database clusters

Like deadlocks in standalone databases, global deadlocks also occur naturally; they are neither bugs in the database system nor errors in the application software. KlustronDB can automatically detect and resolve global deadlocks and return an error to the client in the connection of the victim. The client must also handle this error: roll back the transaction, or re-execute the statement (after adjusting it), or ignore the error and continue with the next statement (this approach is less common).

Discovering and handling global deadlocks in KlustronDB involves the following steps: First, it is necessary to obtain the current local transaction lock waiting relationships within each storage node from the master nodes of all storage shards in the cluster, in order to construct a global transaction lock waiting relationship graph. Then, this graph is traversed to find waiting cycles. When a cycle is found, a victim is chosen from the cycle to deny its lock request, causing the client to experience a statement execution error, and the error handling method is as described above.

Construct a global transaction wait-for graph

The global deadlock detector (gdd) of KlustronDB obtains the local transaction lock wait graphs g1, g2, ... gn from each shard primary node M1, M2, ... Mn in the cluster. The SQL statement used is as follows. This statement is sent by the deadlock detection background process to the primary node of each shard of KlustronDB.

图片3

Merge g1, g2...gn into a graph G: a global transaction wait-for graph.

Why can the above query statement obtain the local wait relationship g within each storage node? Because information_schema.innodb_trx is a system view in MySQL that provides basic information about each running transaction. What we are interested in are the running XA transactions, because only they can form global deadlocks as branches of a global transaction; Information_schema.data_lock_waits is a view of transaction lock wait relationships, from which we can know the wait relationship between any two InnoDB transactions. As in the SQL statement above, a three-table join can obtain the wait relationships of active XA transactions within each storage node, which is gi.

In any local wait-for graph gi, the wait-for relationship of a single shard's transaction branch implies the wait-for relationship of the global transaction it belongs to: ti -> tj => GTi -> GTj, where ti is the transaction branch of GTi on shard_i.

Taking GT1, GT2, ..., GTx as nodes, with their waiting relationships as edges, the resulting wait-for graph is denoted as G. Since KlustronDB supports parallel execution of queries between compute nodes and storage nodes, a compute node can asynchronously send INSERT/DELETE/UPDATE statements to the primary nodes of multiple shards. These statements may then form local wait-for relationships across multiple shards, which leads to each global transaction node in G potentially having multiple outgoing edges—meaning a global transaction may simultaneously depend on multiple global transactions. KlustronDB's GDD can correctly handle such complex situations.

Starting from Klustron-1.3, support for global deadlock detection and resolution for the RocksDB engine was added. Therefore, the global deadlock handler of KlustronDB_server also obtains lock wait relationships from the RocksDB transaction state view of KlustronDB_storage. These two sets of lock wait relationships are used to construct a global lock wait graph, thereby supporting transactions that access hybrid engines (InnoDB and RocksDB) as well as transactions that use a single engine (InnoDB or RocksDB).

Detection and Resolution of Global Deadlocks in KunlunBase

After obtaining G, perform graph traversal in G to look for (multiple) global transaction wait loops.

After finding a loop, select a victim within the loop according to a certain elimination rule, and kill its statements in all shards; this resolves a loop in G. Currently, KlustronDB supports several global deadlock elimination strategies, which can be selected by setting the configuration option global_deadlock_detector_victim_policy. The strategy names generally indicate the logic of the strategy and are listed below. Each strategy may have its rationale, so only adjust it if precise tuning is really necessary; otherwise, use the system default setting.

KILL_OLDEST: Kill the transaction that has been running the longest

KILL_YOUNGEST: Kill the transaction with the shortest runtime

KILL_MOST_ROWS_CHANGED: Kill the transaction that updated the most rows

KILL_LEAST_ROWS_CHANGED: Kill the transaction that updated the fewest rows

KILL_MOST_ROWS_LOCKED: Kill the transaction that has locked the most rows

KILL_MOST_WAITING_BRANCHES: Kill the transaction that has the most transaction branches waiting for transaction locks

KILL_MOST_BLOCKING_BRANCHES: Kill the transaction that blocks the most other transactions with its branches

The code implementation of KlustronDB global deadlock detection and resolution is located at

https://gitee.com/zettadb/kunlun/blob/main/src/backend/access/remote/remote_xact.c

Global deadlock test code: We have tested various types https://gitee.com/zettadb/kunlun/blob/main/src/test/kunlun/gdd/gdd2.py

During the design and testing of the global deadlock detection algorithm, we considered all the deadlock loop situations shown in the figure below. Of course, this algorithm can handle all global deadlock situations, including more complex cases.

图片4

Triggering and Client Handling of Global Deadlock Detection in KunlunBase

The KlustronDB global deadlock detection mechanism is a module of the kunlun-server compute node. After the compute node starts, it runs as a background process called the global deadlock detection daemon (gdd). If during the execution of a DML statement (insert, delete, update), a statement sent to the storage node does not return within a certain time (start_global_deadlock_detection_wait_timeout), the compute node will notify the global_deadlock_detector process to trigger a round of global deadlock detection and resolution. At the same time, gdd also runs periodically in the background, with the interval set by global_deadlock_detector.naptime=3s.

The global transaction execution statement selected as the victim will return the error ER_QUERY_CANCELED to the client. By default, the transaction is automatically rolled back within the compute node, and the user will ignore all subsequent statements in this transaction until the transaction ends.

KlustronDB also supports MySQL's transaction processing mode. If enable_stmt_subxact = true is set before starting the transaction, then statement execution errors will not automatically roll back the transaction, but will be handled by the client code. This mechanism applies to all errors, not just deadlock errors; it also applies to both MySQL and PostgreSQL connections. When the client receives any statement execution error, it can ignore the error and continue executing subsequent SQL statements, and then commit the transaction; or re-execute this SQL statement and the following statements, and then commit the transaction; or directly roll back the transaction, and then the transaction can be re-executed.

Kunlun-server records every global transaction rolled back during each global deadlock detection in its runtime logs for users to trace and inspect.

Summary

The KlustronDB global deadlock detection mechanism quickly and effectively discovers and detects global deadlocks, ensuring the efficient and smooth operation of the KlustronDB cluster. Application software developers need to correctly handle deadlock errors as described in this document. In this way, the application system can effectively handle deadlocks that occur during the operation of the KlustronDB distributed database cluster and maintain efficient operation of the application system.