KlustronDB Backup and Recovery
KlustronDB Backup and Recovery
KlustronDB supports full and incremental physical backups, as well as point-in-time restoration (PiTR) for the entire cluster or for specific shards. The backup and recovery mechanism is the foundation for DBAs to ensure disaster recovery capability; it is extremely important. DBAs must properly configure full physical and logical backup tasks, ensure that the storage system holding the backup data is functioning properly, has sufficient free space, and possesses high availability. It is recommended to use HDFS or public cloud object storage systems. Starting from KlustronDB version 1.3, Alibaba Cloud and AWS object storage systems are supported for storing backup data. As a best practice, DBAs need to regularly conduct restoration drills to ensure the validity of backup data.
Modules related to the KlustronDB system (such as Cluster_mgr, nodemgr, etc.) automatically perform full physical backups of the entire cluster at regular intervals, copying each shard's data directory to a preconfigured storage system path. The primary node of each shard continuously stream-copies incremental data update logs (binlogs) to a designated location in the preconfigured storage system. When a user needs to restore to any point in time, they initiate a PiTR operation via XPanel or the cluster_mgr API. The relevant KlustronDB modules then automatically copy the closest full physical backup data to the target restoration time from the backup storage configured by the user to restore either the full cluster data or the data of a specified shard. Afterwards, the backup binlog files are used to complete the point-in-time recovery, restoring the cluster to a consistent state.
Note that data backup and restoration (recovery) for a KlustronDB cluster must be done using its built-in physical or logical backup and restore functions. Third-party tools cannot be used, because those tools do not understand the operation and commit state of KlustronDB's global transactions, and they do not have the ability to restore cluster data to a consistent state, which may result in partially committed transactions and other erroneous states or corrupted data.
Fundamental Concepts of Globally Consistent Physical Backup & Recovery
Physical Backup: Backing up the physical files of a database (data files, transaction log files, parameter files). Physical backups can be further divided into offline backups (cold backups) and online backups (hot backups).
KlustronDB clusters support online backup. During the backup process, the database remains operational, and application read and write operations are not blocked. Since the backup operation occurs on the slave node of the master-slave nodes, it has essentially no impact on application performance.

1. Backup and Recovery Architecture
Backup and recovery targets: storage shard and metashard
Backup and Restore Scheduling Center: Cluster manager
Backup Data Storage: Backup Storage Pool
Cluster Backup Execution Unit: Node Manager
2. Basic Principles
KlustronDB Cluster Backup & Recovery Working Principle
2.1 Backup Data Targets
In the KlustronDB (Kunlun Database) cluster, the metashard stores the node information, table structure information, transaction information, backup and recovery information, etc., of the entire cluster. It is the foundation for the normal operation of the cluster. The metashard uses a high-availability architecture with one master and two replicas.
The storage shard is responsible for storing business data. The data is distributed across data shards, and the storage shard is composed of multiple shards, each of which has multiple replicas.
The data on the computing nodes is a subset of the metashard data. The computing nodes themselves are stateless and do not need separate backups.
Global consistent backup can back up all the data of the entire cluster (backing up all metadata and storage shard data), or it can back up only part of the data, such as only a specific storage shard.
2.2 Global Consistency Backup and Recovery
Backup: At the beginning of a backup, the backup recovery manager obtains the global transaction information of the entire cluster from the metadata database, then starts the backup. For each backup target, both the data files and transaction logs of the backup target need to be backed up. The database files and transaction logs during the backup are centrally stored in the backup resource pool (storage). The backup ends after all files are copied. Backup information is recorded in the metadata database. The cluster operates normally during the backup, and there is no need to stop business operations.
Recovery: At the beginning of the recovery process, the backup recovery manager obtains the currently available backup information from the metadata database, then copies data from the backup resource pool to the recovery target. After the data copy is completed, it performs transaction rollback or rollforward according to the transaction logs, thereby restoring the entire cluster to a consistent state.
2.3 Backup Types
Full backup: Backs up all data of the backup target (data files and transaction logs), serving as the foundation for incremental backups.
Incremental backup: Back up the incremental data and transaction logs since the last backup.
Full backup: Backup all data of the entire cluster (metadata and all storage shard data).
Partial backup: Only backs up data from a specific shard.
2.4 Recovery Type
Full Restore: Restore the entire cluster
Partial recovery: only recover a specific shard
Time-based Recovery: Restore to a Specific Point in Time (PiTR)
Transaction-based recovery: Recovery based on a specific transaction number. This is a more precise way to specify a point in time, because in a high-load system, thousands of transactions may be committed every second, and the specified point in time may not correspond precisely to a single transaction.
3. Execution Steps
Backup
3.1 Set Backup Strategy
Determine the backup objects, backup type, and prepare the backup target storage.
3.2 Execute Backup
Schedule backups via command line or UI (KlustronDB provides a web interface)
[kunlun@kunlun-test6 util]$ backup --help
Usage of backup:
-HdfsNameNodeService string
specify the hdfs name node service, hdfs://ip:port
-backuptype string
back up storage node or 'compute' node,default is 'storage' (default "storage")
-clustername string
name of the cluster to be backuped
-coldstoragetype string
specify the coldback storage type: hdfs .. (default "hdfs")
-etcfile string
path to the etc file of the mysql instance to be backuped,
if port is specified and the related instance is running,
the tool will determine the etcfile path
-port string
the port of mysql or postgresql instance which to be backuped
-shardname string
name of the current shard
-workdir string
where store the backup data locally for temp use (default "./data")
3.3 Check the backup results and confirm that the backup was successful
Restore
3.4 Set Recovery Policy
Confirm recovery object, backup type, and prepare backup target storage
3.5 Execute Recovery
Restore operation instruction:
[kunlun@kunlun-test6 util]$ restore --help
Usage of restore:
-HdfsNameNodeService string
specify the hdfs name node service, hdfs://ip:port
-enable-globalconsistent
whether restore the new mysql under global consistent restrict
-metaclusterconnstr string
current meta cluster connection string e.g. user:pass@tcp(ip:port)/mysql
-mysqletcfile string
etc file of the mysql which to be restored, if port is provied and mysqld is alive ,no need
-origclustername string
source cluster name to be restored or backuped
-origmetaclusterconnstr string
orig meta cluster connection string e.g. user:pass@tcp(ip:port)/mysql
-origshardname string
source shard name to be restored
-port string
the port of mysql/postgresql instance which to be restored and needed to be running state
-restoretime string
time point the new mysql restore to
-restoretype string
restore storage node or 'compute' node,default is 'storage' (default "storage")
-workdir string
temporary work path to store the coldback or other type files if needed (default "./data")
3.6 Check the recovery results and confirm that the recovery was successful
4. Cluster Backup and Restore Demonstration
Demonstrate how to perform cluster backup and recovery in the KlustronDB UI interface
Environmental Information:
The cluster that needs to be backed up consists of a compute node, a shard (each shard consists of one primary and two secondary nodes), and a metashard (one primary and two secondary nodes).

Data status before backup:
_
Step 1: Start the backup
Backup operation: Click the backup button on the cluster management interface to start the backup.

Step 2: Check the backup status
根据执行时间, 备份成功后, 可以看到:backupcluster succeed 信息。

Step 3: Restore the cluster
Select the recovery point, and then confirm the recovery.
The cluster has entered recovery mode:
_
After the recovery is completed, the system will create a new cluster in the available resource area and restore the backed-up data.
Restore status:

Recovered cluster:

Enter the recovered cluster computing node and check the recovered data:

