Skip to main content

MySQL 8.0 Instant DDL

KlustronAbout 4 min

MySQL 8.0 Instant DDL

Key Takeaway

Instant DDL in MySQL 8.0 is one of the most significant features of this version, offering immediate completion of certain DDL operations by merely updating the data dictionary, thus significantly enhancing DDL execution performance.

The Instant DDL feature in MySQL 8.0 stands out by only modifying the data dictionary, drastically optimizing the execution efficiency of certain DDL operations, and enabling the immediate execution of tasks such as adding or removing columns. This presentation will mainly cover the principles and implementation methods of Instant DDL, along with new features developed by Klustron based on this foundation. The aim is to deepen the understanding of this pivotal feature and provide a more comprehensive insight into Klustron.

01 Introduction to MySQL DDL

Let’s take a moment to explore the evolution of MySQL DDL.

  • Before MySQL 5.5, only the Copy method for DDL operations was supported, requiring data duplication and prohibiting concurrent writes to the table being modified;
  • Starting with MySQL 5.5, Inplace DDL operations were supported, allowing for changes directly on the existing table data without the need for copying, but still not supporting concurrent writing;
  • MySQL 5.6 introduced support for Online DDL, enabling most DDL operations to be performed concurrently with reads and writes, significantly reducing the overall impact of DDL operations on the system;
  • MySQL 8.0 introduced support for Instant DDL, enabling immediate completion of DDL operations by modifying only the data dictionary and not the data itself.

02 In-depth Look at MySQL Instant DDL

Why?

The need for Instant DDL arises from several factors:

  • The time-consuming nature of executing DDL on large tables, especially in replication scenarios;
  • Executing some DDLs requires double the disk space;
  • The substantial consumption of IO, memory, and CPU resources by some DDL operations;
  • In replication scenarios, DDL execution on the replica can take a long time to synchronize with the primary.

Below is the execution process for an Online DDL:

From the diagram above, we can see that before MySQL 8.0, executing a DDL operation required three stages, involving complex processes such as creating a new table, importing data into the new table, and finally switching between the new and old tables. This process was not only intricate but also resource-intensive, requiring extensive disk, memory, and CPU resources, leading to low DDL execution efficiency and impacting the overall system performance.

Therefore, a new method for executing DDLs was needed to make them both fast and efficient.

Principles of Instant DDL:

The core principles of Instant DDL include:

  • Modifying only the data dictionary, without copying or altering any historical data.
  • Employing flags to distinguish between old and new data row formats.
  • Post MySQL 8.0.29, using row versioning to identify the corresponding structure of data rows.

The comparison between the Instant and copy methods for DDL execution, illustrated above, highlights the efficiency of the Instant method. Adding a column takes merely 0.09 seconds with Instant DDL, in stark contrast to the 29.17 seconds required by the copy method—a significant difference!

The following image showcases the table structure versioning implemented in version 8.0.29:

This diagram illustrates that table structures in MySQL 8.0.29 are versioned, which can be used to differentiate between various table structures.

Instant DDL Execution Process:

To illustrate the differences between Instant DDL and Online DDL, let’s examine the following diagrams from Alibaba's Database Kernel Monthly:

The process of adding a column with Online DDL is shown here:

Contrastingly, the Instant DDL process for adding a column is depicted as follows:

From these comparisons, it's evident that Instant DDL, by merely updating the data dictionary, eliminates the need for action during the execution phase, drastically shortening process time and significantly enhancing the efficiency of schema changes.

Internal Implementation of Instant DDL:

Instant DDL modifies only the data dictionary without altering the data. But how does MySQL differentiate between data structures post-DDL?

:The essence of Instant DDL lies in InnoDB's sophisticated internals, notably:

  • Identifying new data formats with info bits;
  • Saving the current row's field count information by adding field quantity information;
  • Post MySQL 8.0.29, using row versioning to determine the corresponding structure.

Here's the data structure of a table record unaltered by Instant DDL.

After a column is added through Instant DDL, new record structures appear as:

A comparison shows that new records set the INSTANT_FLAG bit in the record header's info-bits, indicating a new record inserted after a schema change. The field count (in red) provides the current record's number of fields.

Considering DML operations post-Instant DDL:

  • Inserts are made in the new format.
  • Searches discern between new and old versions via flags.
  • Updates convert data to the new format.
  • Deletions remain unaffected.

Instant DDL ingeniously separates new and old data row structures, facilitating the recognition and access of varying version data within a single table.

03 Klustron's DDL

First, let's briefly introduce the core architecture of our distributed data product Klustron:

Klustron’s Distributed Compute-Storage Separation Architecture

  • Compute Layer (Klustron-server): Composed of multiple PostgreSQL instances, the compute nodes are responsible for accepting connection requests from application software, processing SQL query requests from established connections, executing these requests, and returning the results.
  • Storage Layer (Klustron-storage): Formed by three or more MySQL 8.0 instances, the storage nodes make up a storage cluster (termed as a shard), with each shard holding a portion of user tables or table partitions.
  • Metadata Cluster: Stores Klustron cluster metadata, including topology structure, node connection information, DDL logs, commit logs, and other cluster management logs.
  • Cluster_mgr: Manages the correct status of the cluster and nodes, facilitating cluster administration, logical and physical backup and recovery, and horizontal scaling capabilities.

Next, we'll discuss Klustron's Online DDL feature.

Klustron’s Online DDL (Repartition)

Process: Source table data is exported and loaded into a target table. Concurrent updates to the source table are also migrated. Steps include:

  1. Export Full Table Data: node_mgr calls mydumper to export source table data to the compute nodes.
  2. Load Full Table Data: node_mgr calls the kunlun_loader tool to inject the full dump of the source table data into the target table.
  3. Binlog Catch-Up: node_mgr employs binlog2sync tool, starting from the recorded binlog position at the time of the dump on each shard, to dump binlog events.
  4. Rename Source and Target Tables: The binlog2sync tool quickly synchronizes the remaining binlogs, then renames the target table to the source table's name, restoring normal business operations.

The Future of Klustron DDL:

Beyond introducing the essential Online DDL feature, Klustron is set to expand its DDL capabilities, including:

  • Enhancements and optimizations to Online DDL, leveraging parallel processing for improved efficiency.
  • Transactional DDL, adding transactional properties to DDL beyond mere atomicity.
  • Instant DDL improvements, enabling more DDL operations to be executed instantly.
  • DDL operations designed for parallel processing.

Expanding possibilities and more.

04 Q&A

Q1: Does the execution of Instant DDL require locking?

A1: Instant DDL requires a very brief MDL (metadata lock) during the final commit phase, specifically during the switch between old and new table structures, to prevent users from accessing incorrect table structures. This locking period is extremely short and involves no complex operations, essentially completing instantaneously with no impact on system performance.

Q2: Where can I try Klustron?

A2: Those interested in Klustron can download a trial version from our official website and follow the installation guide for setup. Additionally, we offer Klustron's serverless services on Amazon Marketplace and Alibaba Cloud for those who would like to explore further.

END