Skip to main content

GIS-related Features of MySQL and Klustron

KlustronAbout 4 min

GIS-related Features of MySQL and Klustron

Introduction

In the wake of mobile internet's exponential surge, Geographic Information System (GIS) data is now widely used across diverse landscapes. MySQL offers features like spatial indexes and spatial data processing functions, while PostGIS enhances PostgreSQL but faces single-machine resource constraints. Klustron adopts the PostGIS extension, boosting GIS data management scale and performance.

In the sections below, we will discuss MySQL's underlying GIS technologies and Klustron's GIS support and advantages, providing a deeper understanding of their implementation and principles.

Key Takeaway: MySQL's spatial data indexes and spatial data processing functions furnishes crucial foundational support to the Geographic Information Systems.

01 Introduction to MySQL GIS Features

Geographic Information System (GIS) is a computer technology system that, based on the geospatial database and supported by computer hardware, carries out the collection, management, and analysis of spatial-related data.

[Image: GIS system overview]

MySQL has started to support basic spatial data types and some spatial object functions since version 4.1. Subsequently, the MyISAM storage engine incorporated spatial indexing. In version 5.7, the InnoDB engine introduced spatial indexing and adopted the Boost.Geometry library to support GIS. Version 8.0 underwent further optimizations, such as the inclusion of the Spatial Reference System (SRS).

[Image: MySQL GIS evolution timeline]

Klustron employs a distributed architecture that separates computing and storage. The upper computing nodes use PostgreSQL, while the lower storage nodes operate with MySQL instances, communicating through data channels. To facilitate GIS, Klustron has implemented modifications at both the computing and storage levels, mainly:

  • At the computing level, the Klustron-server nodes, being based on PostgreSQL, can integrate PostGIS to offer GIS services to users, helping PostGIS overcome single-machine server resource constraints. Klustron-server's distributed query processing mechanism interacts with the storage level Klustron-storage instances for GIS data access and storage based on their internal formats.
  • At the storage level, Klustron-storage maximizes MySQL's GIS data processing capabilities to realize the transactionality and indexing capabilities of GIS data.

Illustration of the overall architecture of the Klustron distributed database:

[Image: Klustron distributed database architecture diagram]

02 Detailed Explanation of MySQL GIS Features

MySQL Spatial Data Types and Functions

MySQL supports two primary spatial data formats: WKT (Well-Known Text) and WKB (Well-Known Binary). These GIS data formats are stored internally as BLOBs within the InnoDB storage engine.

The main spatial data types supported by MySQL include:

GEOMETRY (general category), POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMCOLLECTION.

The principal function categories for operating spatial data in MySQL include:

Construction: Point(x,y), LineString(…), etc.;

Format Conversion: ST_GeomFromText(…), ST_LineFromText(..), etc.;

Object Attributes: ST_Dimension(), ST_GeometryType(), etc.;

Relationship Calculation: ST_Contains(), ST_Crosses(), ST_Within(), etc.;

Object Generation: ST_Union(), etc.;

Others: ST_GeoHash(), ST_IsValid().

Spatial Reference System (SRS)

The Spatial Reference System (SRS) refers to the combined plane coordinate system and elevation coordinate system used to determine the planar position and elevation of geographic targets.

SRID (Spatial Reference ID) refers to the ID of the spatial reference system.

[Image: SRS coordinate system illustration]

As illustrated above, when there is no SRID or the SRID is 0, spatial objects reside in a pure Cartesian plane coordinate system, meaning they don't have a defined coordinate position. When the SRID is X or Y, spatial objects are located within a specific coordinate system, where their position can be determined.

Commonly Used SRIDs include:

  • SRID=0 Cartesian Plane Coordinate System (Default)
  • SRID=4326 GPS Coordinate System
  • SRID=3857 Web Map Projection Coordinate System

As shown in the example below, SRID can be specified when creating a table.

[Image: Example of specifying SRID when creating a table]

MySQL 8.0 has introduced SRID constraints on spatial data. Creating a spatial index requires the corresponding column to be NOT NULL and have a specific SRID. Columns without an SRID attribute are considered non-SRID constrained, and they will accept values from any SRID coordinate system, but the optimizer cannot use a spatial index on such columns lacking an SRID.

Here is an example of this constraint:

[Image: Example of SRID constraint]

Spatial Index (R-Tree)

Starting from MySQL 5.7, InnoDB has provided index support for GIS data, adopting the R-tree index.

R-tree Structure: Similar to the B-tree index, the R-tree is also a tree-structured index. The difference is that the R-tree index key value is MBR (Minimum Bounding Rectangle).

Leaf nodes contain the MBR and point to clustered index records, while non-leaf nodes contain pointers to leaf nodes and the MBR composed of the corresponding leaf node records.

Below is a typical structure of an R-tree index:

[Image: R-tree index structure diagram]

It can be observed that the MBRs of records in leaf nodes A and B are included in the MBR-N of their parent node. For instance, in a query aiming to find A, the search would sequentially traverse from the root node T, through N, down to A.

R-tree Concurrency Control: Since MBRs lack a concept of order, unlike B-trees, R-trees employ predicate locks for concurrency control and transaction isolation. Predicate locks refer to the logical predicates of certain DML operations, such as "within 100 meters around point A." This condition becomes a predicate, serving as a basis to determine access conflicts when accessing certain data records.

R-tree operations and concurrency control are quite advanced topics. We can delve deeper into these subjects later on if there is interest.

Here is an example of creating a spatial index:

[Image: Example of creating a spatial index]

03 MySQL GIS vs PostGIS or Klustron

The table below compares some functionalities between MySQL GIS and PostGIS:

[Image: Comparison table between MySQL GIS and PostGIS]

Through the comparison, we find that overall, PostGIS has a certain advantage in terms of functionality over MySQL. Additionally, PostGIS supports 8500 SRSs, while MySQL supports 5151, with 59 being supported by MySQL but not by PostGIS. Furthermore, in terms of spatial operation functions and operators, PostGIS is much richer than MySQL.

GIS in Klustron

We have previously introduced Klustron's architecture and some of its GIS features in detail.

Klustron's support for GIS includes:

  • Klustron supports all GIS functions and operators of PostGIS;
  • Klustron supports the intersection of SRSs from PostGIS and MySQL;
  • Klustron also supports PostgreSQL's native geometry and geography types, including point, lseg, path, circle, polygon, box types, etc.;
  • Klustron has supported multidimensional data by extending MySQL to store multidimensional data.

Usage of the PostGIS plugin in Klustron:

  • Query the pg_mysql_srs_basics table at the computing nodes to view the SRS collection supported by MySQL.

The method of creating a PostGIS extension in Klustron includes:

  • set create_system_table=true;
  • create extension postgis;
  • set create_system_table=false;

04 Q&A

q1: How does the GIS performance of PostGIS compare to that of MySQL?

a1: As we have mentioned before, functionality-wise, PostGIS holds certain advantages over MySQL. In terms of performance, we haven't conducted detailed evaluations, but based on some information obtained from the industry, PostGIS also holds certain advantages in performance. After all, PostGIS has the first-mover advantage over MySQL, with a higher level of maturity. Therefore, having a performance advantage is not surprising.

q2: What improvements will Klustron make in terms of GIS in the future?

a2: Currently, Klustron has already acquired the capability to handle GIS data based on PostGIS and MySQL GIS, and compared to both, it has the blessings of a distributed architecture, making its functionality and performance superior. In the future, Klustron will prioritize addressing some compatibility issues in GIS, such as the handling of null values, etc. Then, we will continue to explore possible optimization points and enhance functionalities to provide better support to users.

END