MySQL Technical Insights Series: In-Depth Look at JSON Features

Key Takeaway：

As MySQL's official support for JSON data processing continually enhances, the use of JSON data within MySQL is set to become even more efficient and convenient.

Amidst the burgeoning evolution of internet applications, the demands for storing and handling semi-structured and unstructured data have amplified across all database systems. MySQL has tailored a suite of features to adeptly manage such data. In this talk, we'll explore how MySQL works with JSON data, with a special focus on JSON data indexing, aiming to provide a comprehensive understanding of how MySQL handles JSON data.

1 JSON Data Processing in MySQL

JSON (JavaScript Object Notation) is a lightweight data exchange format primarily used for internet application services. It's often considered a type of semi-structured data.

To keep pace with the rapid development of internet applications, MySQL has supported JSON data processing since version 5.7 and added more JSON data support in subsequent versions.

Using JSON in MySQL has some great perks:

• More flexible data: You can change the contents of fields quickly without adding columns. JSON data is organized in key-value pairs, which lets you easily add or remove information without messing with columns like in traditional tables.

• Saves storage space: It avoids having to store empty values (NULLs) in sparse fields, saving you some storage space.

• Better indexing of JSON elements: Unlike JSON in string format, JSON types support indexing for specific query optimizations. Currently, the latest version 8.0 supports index-based generated columns and multi-value indexing, making searches even faster and more efficient.

• Easy queries across data types: It's easier to query a mix of structured and unstructured data. Since both types can be placed in the same table, you can use one SQL statement to query all of this data.

Methods to store and retrieve JSON in MySQL:

• Create tables with JSON fields.

• Insert or modify JSON data just like any other field types.

文本描述已自动生成

• Query JSON field content as you would with string types.

AnimalsInfo Table Data

• Query specific property values directly by specifying the property names of the JSON field.

Fetch Json Values

• Delete JSON values using JSON_REMOVE.

文本描述已自动生成

• Update property values using JSON_SET, JSON_INSERT, JSON_REPLACE.

Update Json Values

In addition, MySQL provides the following commonly used functions to process JSON data:

• JSON_EXTRACT: Extracts the required field content from JSON data.

• JSON_UNQUOTE: Removes the outermost double quotes.

• MEMBER OF: Can only be used for JSON arrays. Checks if an element exists in a JSON array, returning 1 for presence, 0 for absence.

• JSON_CONTAINS: Can be used for JSON arrays and JSON objects. Checks whether one or more elements exist in JSON arrays, or whether there is a certain value or a certain path (Key) under the specified path for JSON objects.

• JSON_OVERLAP: Compares for common elements in JSON arrays or matching key-value pairs in JSON objects, returning 1 for a match and 0 for a non-match.

• JSON_KEYS: Returns keys from JSON objects, including nested ones.

2 JSON Data Indexing in MySQL

To facilitate quick and efficient querying of information within JSON data, MySQL provides indexing features specifically designed for JSON data. This section focuses on how MySQL establishes indexes for JSON data.

Indexing JSON via a generated column:

1: MySQL allows indexing JSON data by creating generated columns and adding indexes to them. For example:

ALTER TABLE customers ADD COLUMN zip_code INTEGER GENERATED ALWAYS as (user_info->"$.zipcodel");

ALTER TABLE customers ADD INDEX zip_code (zip_code) USING BTREE;

2: Starting from MySQL version 8.0.13, you can also create function indexes directly to index JSON data. For example:

ALTER TABLE customers ADD INDEX zip_code (CAST(user_info->"$.zip_code" as INTEGER)) USING BTREE;

图示描述已自动生成

In the above example, users establish a generated column on the user_info JSON field and create a B-tree index on this column, enabling fast searches for the zip_code information within the user_info.

Multivalue indexing:

Starting from version 8.0.17, MySQL allows indexing JSON arrays directly using multivalue indexes. For example:

• Creating a multivalue index:

CREATE INDEX zips ON customers ( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );

ALTER TABLE customers ADD INDEX comp(id, modified, (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );

• Querying using the index:

SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode’);

SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->‘$.zipcode’,CAST(‘[94507,94582]’ AS JSON));

SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode',CAST('[94507,94582]' AS JSON));

图示描述已自动生成

In the above example, for the zip_code information within the JSON column "custinfo," which contains multiple values in the form of an array, users can create a B-tree multivalue index in MySQL as shown above to enable fast searches for such data.

3 JSON Data Processing in Klustron

Currently, Klustron provides basic support for JSON data. Let's take a look at the overall architecture of Klustron:

As shown in the figure above, Klustron is a typical distributed database system that separates storage and computation. The computing nodes are based on PostgreSQL, leveraging its powerful query optimization capabilities. The storage nodes, on the other hand, utilize the more efficient MySQL as the underlying storage engine.

At the moment (Klustron 1.2 version), Klustron only supports the use of PostgreSQL's JSON functions and operators to access JSON data.

We are actively working on integrating the computing and storage nodes to handle JSON data. This approach will leverage the indexing capabilities of the storage nodes to accelerate the processing of JSON data, enabling more efficient storage and querying of JSON data.

图示描述已自动生成

4 Q & A

Q1: Can you provide a brief comparison between data handling and indexing in PostgreSQL and MySQL?

A: As far as I know, MySQL has relatively slower progress in supporting JSON data compared to PostgreSQL. PostgreSQL has provided support for JSON data for a longer time and offers a wider range of indexing options, including B-tree, HASH, GiST, and GIN indexes for JSON data. In contrast, MySQL only supports B-tree indexes for JSON data. Therefore, in terms of features, PostgreSQL surpasses MySQL in supporting JSON data. However, in terms of performance, we haven't conducted a direct comparison. Those who are interested can perform some tests, and it is likely that each database has its own strengths in different scenarios. However, considering the latecomer advantage, MySQL may have more room for performance optimization in the future.

Q2: How should we decide whether to use a JSON field or a regular field?

A: While using JSON data has its advantages, it is not suitable for all scenarios. Handling JSON data incurs additional overhead, such as parsing JSON and data type conversions, which can result in slightly longer data access time compared to regular fields. Therefore, the decision to use JSON data should be based on the specific use case and data characteristics. For example, if the data has minimal variability, rarely requires changes to the table structure, requires frequent access, and demands high performance, traditional structured data fields would be more appropriate. On the other hand, if the data structure frequently changes, data access is less frequent, and performance requirements are not as high, using JSON could be considered.

MySQL Technical Insights Series: In-Depth Look at JSON Features

# MySQL Technical Insights Series: In-Depth Look at JSON Features

# 1 JSON Data Processing in MySQL

# 2 JSON Data Indexing in MySQL

# 3 JSON Data Processing in Klustron

# 4 Q & A

# END

MySQL Technical Insights Series: In-Depth Look at JSON Features

1 JSON Data Processing in MySQL

2 JSON Data Indexing in MySQL

3 JSON Data Processing in Klustron

4 Q & A

END