Error Handling for Two-Phase Commit in Distributed Transactions
Error Handling for Two-Phase Commit in Distributed Transactions
1. Background
In Klustron, our team has successfully avoided the pitfalls of the classic two-phase commit algorithm in the two-phase commit mechanism for distributed transaction processing.
In terms of the mechanism and principles of the two-phase commit for distributed transaction processing, we have improved its disaster recovery and error handling capabilities, ensuring that at any time, the failure of any node, network failure, or timeout of the Klustron cluster will not cause inconsistencies or loss of data managed by the cluster.
This article will elaborate on the error handling principles and mechanisms of distributed transactions in the two-phase commit algorithm, as well as latency costs in Klustron.
2. How does Klustron handle errors in the two-phase commit algorithm?
In the working scenarios of distributed database clusters in production environments, usually less than 0.01% of distributed transaction commits will encounter errors. However, we still need to handle all possible errors because even if one of the 100 billion transactions fails to commit correctly, it can result in data errors for users.
The database system is designed to ensure that transactions are always committed correctly, and ACID guarantees always hold, without exception.
This is more complex for distributed database systems than for single-node databases, as there are more possible sources of errors (multiple computing nodes, multiple storage nodes, and their network connections).
This is why the design and implementation of database systems are so complex, and that of distributed database systems is even more complex.
Next, let's take a look at how the Klustron cluster handles errors that occur during the distributed transaction submission process. We will discuss the error handling for each phase of the two-phase commit, as well as error handling for batch commit log writing.
2.1 Error Handling in Phase One
As shown in the above figure, if there is a statement error, network disconnection, or timeout during the prepare phase, GTM will submit a rollback record request to GTSS, immediately send a rollback command to the error node and disconnect the timed-out connection without waiting for its response, and return an error to the client, informing the client that the GT transaction has been rolled back.
GTSS records the commit instruction of the GT in the commit log as ROLLBACK, so that cluster_mgr can subsequently roll back its prepared transaction branches.
2.2 Error Handling in Batch Commit Logging
As shown in the above figure, if there is an error or timeout during the GTSS commit log write, GTM will roll back all prepared transaction branches of GT, that is, send XA ROLLBACK to all storage clusters written by GT, and then return 'Aborted' to the client regardless of the result.
Even if XA ROLLBACK fails to send, this transaction branch will still be rolled back by the cluster_mgr as expected.
2.3 Error Handling in Phase Two
As shown in the above figure, if there is a network error or timeout during the second phase, it still returns a successful commit to the client.
This is because once any distributed transaction that commits to the commit log is recorded, it must complete the commit.
If any computing or storage node fails or has network problems during the second phase, the cluster_mgr process will handle these transaction branches based on the commit log's instructions. If the instruction is to commit, all transaction branches of GT will be committed.
If the instruction is to roll back or the commit log of GT cannot be found, all transaction branches of GT will be rolled back.
Even if a computing node crashes or loses its network connection during the second phase, the transaction will still be committed. In this case, the backend of the application system (i.e., the database client) will find that its commit statement has not returned until the database connection times out (usually the application layer will also allow the end user connection to time out) or returns a disconnect error.
3. Latency cost
Since both the prepare and commit phases of two-phase commit require waiting for the storage engine to flush WAL logs, and between the two phases, waiting for the commit log to be written to the metadata cluster, the time required for two-phase commit will certainly be longer than executing the same SQL DML statement with only a single phase.
According to this performance report: http://www.zettadb.com/blogs/perf-cmp1, Klustron's two-phase commit increases latency by approximately 30 milliseconds on ordinary server hardware configurations and gigabit networks.
In commercial server hardware and network environments, this latency increase will be less than 30 milliseconds. This 30 milliseconds includes the time required to write the commit log, wait for an additional phase to be executed, and all additional network communication time.
4. Conclusion
Klustron's distributed transaction processing mechanism ensures consistency and disaster tolerance in executing and committing distributed transactions. It ensures that any node or network failure during transaction commit will not result in the failure of ACID guarantees, thus ensuring data accuracy for users.
In version 0.9, Klustron will support global MVCC consistency. We will introduce the working mechanism of Klustron's global MVCC in another article.