HTAP Database Capability Series Sharing: Read-write Separation

Quote of this live streaming:

Because of the read-write separation capability, Klustron can achieve completely separated OLTP and OLAP workloads to avoid resource contention and performance interference.

01 Why Do We Need Read-Write Separation?

After a database has the capability to scale horizontally, there are generally two approaches to cope with the overwhelming pressure on a single server: horizontal sharding and read-write separation.

Horizontal sharding is the process of distributing data into multiple database instances according to a certain rule, so that when the application accesses the data, it can be routed to the corresponding shard based on the rule.

The read-write separation involves assigning a portion of business read requests to slave databases in a master-slave architecture.

Both approaches have their advantages. With sharding, each shard can be read and written independently, and the system's write capacity is also improved. Moreover, there is no need to consider the issue of master-slave delay when accessing each shard's master database.

The disadvantage of sharding is that, as data is distributed across multiple shard master databases, some data access logic may become more complicated, such as the cross-table join statements. In contrast, read-write separation only requires adding slave databases when scaling up, without modifying the data of the master database, making it more cost-effective.

Especially in the case of compute-storage separation solutions like AWS Aurora, the cost of adding a new slave database is very low, making the cost difference between these two approaches becomes even more apparent.

Read-write separation solutions need to consider the delay of slave databases, so businesses need to be clear about which statements can use the read-write separation strategy. In the HTAP scenario, most analytical queries can tolerate a delay of seconds, but this will also consume more computing resources. Therefore, the read-write separation function is essential for HTAP databases.

When using the read-write separation solution, it is necessary to achieve transparency of the business code. For architectures without intermediate layers, where business layers directly connect to databases, a common solution is to use a third-party service, such as ZooKeeper, to maintain master-slave information, and let the client's connection framework implement the logic of read-write separation. For architectures with intermediate layers, mature intermediate layers will have built-in read-write separation capabilities, enabling businesses to directly access the database with ordinary client application. In cloud environments, the solution using intermediate layers is more widely used.

Klustron routes requests to the storage layer through the computation layer, as shown in the following architecture diagram:

The client can enable read-write separation feature by using "set enable_replica_read=on;".

02 Issues in Implementing Read-write Separation

2.1 Selection Strategy for Slave Database

When there are multiple slave databases, the general strategy is to choose the one with the minimum delay. Of course, if there are differences in configuration or load among slave databases, different weights will be assigned to them, and they will be selected according to the weights as long as the delays meet the requirements.

In principle, write requests access the master database, while read requests access the slave database. However, not all read requests can be processed by the slave database. For instance, in an explicit transaction, if the data rows are updated first, subsequent queries can only continue to access the master database because the slave database is unable to see the updates at this time since the transaction has not been committed yet and the binlog has not been generated. Only by continuing to access the master database can the transaction isolation feature be met.

The read-write separation strategy of Klustron is when the user's SQL meets the following conditions at the same time:

The current SQL type is select;
The SQL does not contain user-defined functions (i.e., functions created by the create function statement), unless the current transaction is a read-only transaction;
If the statement is not in a transaction (autocommit=on), read-write separation is allowed. If the statement is in an explicit transaction, it must meet the following requirements:
- a）If in a read-only transaction, read-write separation is allowed;
- b）If in a read-write transaction, the transaction has not updated any data yet.

The remote query optimizer will send the corresponding SQL execution plan to the node of the slave database for execution.

2.2 Master-Slave Delay

Knowing the precautions for implementing database read-write separation, it is important to be aware of the issue of master-slave delay when using it. Due to this delay, it cannot be guaranteed that the data obtained by routing read queries to the slave database is the latest.

This delay is unavoidable. MySQL's binlog-based synchronization strategy has four types: asynchronous replication, semi-synchronous replication, fullsync (by Klustron), and transactional strong consistency synchronization.

The transactional strong consistency synchronization refers to a scenario where the transaction is applied to the slave database through the binlog, and then the master database returns the information about the successful commit of the transaction to the client. However, this approach sacrifices a lot in terms of performance and availability, so it is rarely used in production.

For the other three synchronization mechanisms, when the master database acknowledges the client, it cannot guarantee that the transaction has been replayed on the slave database.

Therefore, before using read-write separation, businesses must be aware of and accept the existence of master-slave delay.

If the delay is unacceptable, using only the master database is an option. Following the rules mentioned earlier, the Klustron read-write separation cluster offers two ways to force access to the master database: firstly, by executing "set enable_replica_read=off;" in the thread to turn off the read-write separation, and secondly, by sending a SQL statement with write or write lock. Afterward, all query statements of this transaction will only access the master database.

2.3 "Bug" in Consistent Reads

In the mechanism of implementing database read-write separation, there is a "bug" of consistent read that is easy to overlook. Let's take a closer look.

First, let's review the mechanism of the repeatable-read isolation level.

In the database, there is an InnoDB table with two columns (id, c). The id column is the primary key, and there are only two rows of data with initial values of (1,1) and (2,2) respectively, as shown below:

Assuming we perform the following operation sequence without the read-write separation mechanism (by adding "set enable_replica_read=off;" at the beginning of session1 to disable the read-write separation strategy for subsequent requests, i.e., all requests will only be sent to the master database):

In the third select statement of session1, the c value of the second row seen is 2 because the transaction was created before the update statement of session2, so the version of the row with id=2 seen is (2,2). This is consistent with the behavior of the repeatable-read isolation level.

Let's now turn on the read-write separation strategy and execute this operation sequence. We can see that the result of the third execution of session1 is different, and the c value of the row with id=2 seen is now 20, as shown below:

The reason is that after enabling the read-write separation, the transaction view of session1 is initially created on the slave database. When it executes the first update statement, the transaction view on the master database is created. At this point, session2 has already updated and committed the row, so the updated value of 20 can be seen in the transaction view created on the master database.

The severity of this problem depends on the business's need for the repeatable-read isolation level and requires evaluation by the business. We will leave the improvement plan for this problem for future discussion in the live streaming.

03 Q&A

Question：

After enabling read-write separation, the behavior is inconsistent with the behavior of repeatable-read. How can we solve it?

Answer:

It is possible to create a spatiotemporal transaction by creating a transaction view on the master database at the same time as the transaction is initiated on the slave database. This approach can resolve the issue of inconsistent reads that occur in the repeatable-read isolation level when read-write separation is enabled.

Discussion Group（@Abu）Answer:

The key issue is to find a way to filter out the newly written rows of another transaction through a readview or a similar mechanism. This is actually very similar to the problem of cross-node consistency reads in distributed databases, and there are three possible solutions:

When MySQL on the storage node uses binlog replication, a hidden column can be added to the row with a transaction number to ensure that this timestamp is consistent across all nodes (including the master and slave nodes). Both read-write transactions and read-only transactions need to obtain a global transaction number, and when a write transaction is completed, it needs to obtain a global transaction number and write it to the hidden column of the modified row. When a read transaction is opened, it needs to compare the transaction number obtained when the transaction is opened with the transaction number that the row is committed to, in order to determine whether the row is visible. The row is visible when the commit transaction number is less than the transaction number obtained when the read transaction is opened. When the select statement in the above diagram is completed and an update is performed, the transaction switches from a read-only transaction to a read-write transaction. At this point, a new transaction is initiated on the master database using the transaction number obtained when the read-only transaction was opened on the slave database. Since the transaction number is generated by the slave database, data that cannot be read by the slave database cannot be read by the master database either.
If the storage node of master and slave can use redo for replication, the master records the start and end of a transaction in redo, and the slave can construct an active transaction chain, accept read requests and support MVCC. Since the master and slave transaction numbers are guaranteed to be consistent by redo, the readview of the slave database's read transaction can be directly associated with the read-write transaction initiated by the master database. This allows the master database to read the data that the readview of the slave database should read plus the data written by the master database's own transaction ID, without reading the data written by others.
If the storage node of master and slave can use redo for replication and the slave database can accept read requests, it is possible to send all read-write requests to the master database, while sending all read requests to the slave database. After the update operation is completed in the above diagram, the select operation is still sent to the slave database. At this time, the readview of the slave database needs to know the transaction ID of the write transaction associated with the read transaction in the master database. As long as the LSN applied by the slave database's redo log exceeds the LSN of the update operation in the master database, the slave database can read the data written by the write transaction in the master database, while the data written by other transactions are filtered out by the readview and not read.

Mr. Ding Qi（replied@Abu）Answer：

In this reply by Abu, there are three solutions provided: one meets the expectation, one exceeds the expectation, and one greatly exceeds the expectation. I applaud these ideas.

This approach involves using global transaction IDs, which can either be written in the binlog or generated by a global transaction manager. It also enables multiple read nodes to jointly accomplish read-write separation.
Redo log replication is the essence of Aurora, and this solution has a similar feature to the thread-shared readview function implemented by Klustron.
This represents the complete and authentic form of read-write separation. It is hoped that future hardware capabilities can reduce latency to a level where this solution does not affect performance.

Discussion Group（@Abu）Answer:

Thank you for the guidance from Mr. Ding Qi. Although it is one thing to say it, there are still many details to consider in order to achieve practical implementation. Ultimately, there may be many trade-offs. What really matters is being able to successfully implement and apply it to the business. I look forward to new features and capabilities of Klustron in the future.

Mr. Zhao Wei（replied@Abu）Answer：

We will add support for query result consistency in read-write separation within the global MVCC feature of Klustron, but the technical approach is different from the method suggested by @ABu. This will involve significant kernel modifications to the Binlog system and InnoDB. We will provide a detailed explanation after the release of version 1.2.

04 Summary

Database read-write separation is a technique that separates read and write operations of a database to different servers for processing, which can enhance the performance and availability of the database system. By allocating read operations to slave servers, the workload on the master server can be reduced, improving the performance and throughput of the database system.

When HTAP technology is combined with database read-write separation, it can result in a more efficient, faster, and more cost-effective solution for database processing. Applying HTAP technology to read-write separation architecture can meet the needs of real-time transaction processing and complex analysis processing, while database performance and availability are improved through the use of read-write separation technology.