Skip to main content

Klustron Quick Start Guide

KlustronAbout 3 min

Klustron Quick Start Guide

1. Basic Concepts

Computing Node:

A computing node accepts and validates client connections using either the PostgreSQL protocol or the MySQL protocol, and interacts with the storage nodes of the cluster to execute SQL statements for the connected clients.

Computing nodes are stateless and can be added as workload increases. Each computing node can serve read/write requests for users.

Computing nodes have global transaction processing capabilities and ensure data consistency and integrity across storage nodes using the two-phase commit protocol.

The computing nodes of the Klustron distributed database cluster have all the metadata of all database objects (tables, views, materialized views, sequences, stored procedures/functions, users/roles, and privileges). However, they do not store user data locally; instead, they store it in storage nodes.

Computing nodes support both MySQL and PostgreSQL protocols.

Storage Node:

A storage node is where user and business data is stored.

Each storage node stores all or part of the subset of user data. A storage node group, which includes master and slave nodes, is called a shard.

Storage nodes receive SQL commands from computing nodes to insert/update/delete user data or return data to computing nodes. They achieve high data reliability by implementing one-master-multi-slave data replication using either strong synchronization or MGR technology.

The fullsync (strong synchronization) mechanism of Klustron database ensures that a Klustron-storage storage shard (storage cluster) must receive the ACK confirmation of this transaction's binlog from fullsync_consistency_level slave nodes after submitting any transaction before returning a successful confirmation to the client.

Metadata Cluster:

Metadata nodes store the metadata of the entire Klustron cluster, including user-defined data, node connection information, storage node information, transaction information, and more.

Cluster Manager:

The cluster manager runs as a daemon and maintains the replication status of each storage cluster and its nodes, as well as metadata and status synchronization between cluster computing nodes and storage nodes. The cluster manager is also responsible for handling specific failures in distributed transactions.

2. Quick Experience

You can quickly experience the cluster structure and basic functions of Klustron by using the Klustron all-in-one Docker image. The Klustron all-in-one Docker image puts all the nodes of the Klustron cluster into one Docker image, making it convenient for users to experience its functions.

In this image, a small-scale data cluster is simulated with the following nodes:

  • Three klustron-storage nodes that form a 3-replica Meta Shard.
  • Six klustron-storage nodes that form two Data Shards, with each shard having three replicas.
  • Three klustron-server nodes that form three independent computing nodes used for processing client data requests.

Each klustron-storage node in this cluster has a buffer pool size of only 64MB, so this cluster can only be used for functional testing and cannot be used for performance and stress testing.

To quickly experience Klustron using this image, follow these steps:

Step 1:

Requirements: A Linux server with at least 8GB of available memory, 20GB or more of disk space, and Docker installed. The server must also have direct access to the internet.

Log in to the server and run the following command to install the Klustron cluster Docker environment:

# When VERSION can be any of the following values: 0.9.2 v1.0.1 1.0.2 1.1.1
VERSION=1.1.1
sudo docker run --privileged --name kunlun1 -p 5401:5401 -p 5402:5402 -p 5403:5403 -p 5404:5404 -p 5405:5405 -itd registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun:$VERSION  bash -c 'bash /kunlun/start_kunlun.sh'

The demo environment is successfully deployed.

Step 2:

Enter the Klustron database cluster server: #docker exec -it kunlun1 /bin/bash

Check the computing nodes: Run the command shown in the figure below to display the three computing nodes that are already running.

Check the metadata cluster: Run the command shown in the figure below to display the three metadata nodes in the cluster that are already running.

Step 3:

Check the storage nodes:

Note:

  • Port 6004 is the master node of storage node shard 1, while ports 6005 and 6006 are two slave nodes of storage node shard 1.
  • Port 6007 is the master node of storage node shard 2, while ports 6008 and 6009 are two slave nodes of storage node shard 2.

Step 4:

Connect to the database and check the cluster partition storage information. The default user name and its password are both abc.

#source env.sh
#psql -h 127.0.0.1  -p 5401 -U abc  postgres

Step 5:

Create a regular table:

CREATE TABLE testtable1 (id int primary key);

Create a partition table:

CREATE TABLE testtable (id int primary key, name char(8)) partition by hash(id);

CREATE TABLE  testtable_p1  PARTITION OF testtable FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE  testtable_p2  PARTITION OF testtable FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE  testtable_p3  PARTITION OF testtable FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE  testtable_p4  PARTITION OF testtable FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Check the table distribution:

select relname,relshardid  from pg_class where  reltype<>0 and relname like '%testtable% ';

Additional Instructions for Quick Experience

The quick experience environment is a 2-shard MGR cluster. In actual deployment, the Klustron can be deployed as a single-shard or multi-shard system, depending on the application environment requirements.

If deployed as a single-shard Klustron, the hardware requirements for the environment are similar to those of a single-node MySQL database. However, Klustron offers significant improvements in performance and reliability compared to single-node MySQL databases.

A single-shard Klustron can be dynamically scaled to a multi-shard cluster to support larger data volumes and higher workloads. Through scaling, the database's processing capacity can be linearly improved, quickly addressing performance and capacity expansion issues.

The storage in the quick experience environment uses MGR data synchronization. However, in production environments, Klustron recommends the use of strong synchronization technology to achieve higher performance.

END