Skip to main content

KlustronDB Backup and Recovery

KlustronDBAbout 5 min

KlustronDB Backup and Recovery

KlustronDB supports full and incremental physical backups, as well as point-in-time restoration (PiTR) for the entire cluster or for specific shards. The backup and recovery mechanism is the foundation for DBAs to ensure disaster recovery capability; it is extremely important. DBAs must properly configure full physical and logical backup tasks, ensure that the storage system holding the backup data is functioning properly, has sufficient free space, and possesses high availability. It is recommended to use HDFS or public cloud object storage systems. Starting from KlustronDB version 1.3, Alibaba Cloud and AWS object storage systems are supported for storing backup data. As a best practice, DBAs need to regularly conduct restoration drills to ensure the validity of backup data.

Modules related to the KlustronDB system (such as Cluster_mgr, nodemgr, etc.) automatically perform full physical backups of the entire cluster at regular intervals, copying each shard's data directory to a preconfigured storage system path. The primary node of each shard continuously stream-copies incremental data update logs (binlogs) to a designated location in the preconfigured storage system. When a user needs to restore to any point in time, they initiate a PiTR operation via XPanel or the cluster_mgr API. The relevant KlustronDB modules then automatically copy the closest full physical backup data to the target restoration time from the backup storage configured by the user to restore either the full cluster data or the data of a specified shard. Afterwards, the backup binlog files are used to complete the point-in-time recovery, restoring the cluster to a consistent state.

Note that data backup and restoration (recovery) for a KlustronDB cluster must be done using its built-in physical or logical backup and restore functions. Third-party tools cannot be used, because those tools do not understand the operation and commit state of KlustronDB's global transactions, and they do not have the ability to restore cluster data to a consistent state, which may result in partially committed transactions and other erroneous states or corrupted data.

Fundamental Concepts of Globally Consistent Physical Backup & Recovery

Physical Backup: Backing up the physical files of a database (data files, transaction log files, parameter files). Physical backups can be further divided into offline backups (cold backups) and online backups (hot backups).

KlustronDB clusters support online backup. During the backup process, the database remains operational, and application read and write operations are not blocked. Since the backup operation occurs on the slave node of the master-slave nodes, it has essentially no impact on application performance.

1. Backup and Recovery Architecture

Backup and recovery targets: storage shard and metashard

Backup and Restore Scheduling Center: Cluster manager

Backup Data Storage: Backup Storage Pool

Cluster Backup Execution Unit: Node Manager

2. Basic Principles

KlustronDB Cluster Backup & Recovery Working Principle

2.1 Backup Data Targets

In the KlustronDB (Kunlun Database) cluster, the metashard stores the node information, table structure information, transaction information, backup and recovery information, etc., of the entire cluster. It is the foundation for the normal operation of the cluster. The metashard uses a high-availability architecture with one master and two replicas.

The storage shard is responsible for storing business data. The data is distributed across data shards, and the storage shard is composed of multiple shards, each of which has multiple replicas.

The data on the computing nodes is a subset of the metashard data. The computing nodes themselves are stateless and do not need separate backups.

Global consistent backup can back up all the data of the entire cluster (backing up all metadata and storage shard data), or it can back up only part of the data, such as only a specific storage shard.

2.2 Global Consistency Backup and Recovery

Backup: At the beginning of a backup, the backup recovery manager obtains the global transaction information of the entire cluster from the metadata database, then starts the backup. For each backup target, both the data files and transaction logs of the backup target need to be backed up. The database files and transaction logs during the backup are centrally stored in the backup resource pool (storage). The backup ends after all files are copied. Backup information is recorded in the metadata database. The cluster operates normally during the backup, and there is no need to stop business operations.

Recovery: At the beginning of the recovery process, the backup recovery manager obtains the currently available backup information from the metadata database, then copies data from the backup resource pool to the recovery target. After the data copy is completed, it performs transaction rollback or rollforward according to the transaction logs, thereby restoring the entire cluster to a consistent state.

2.3 Backup Types

Full backup: Backs up all data of the backup target (data files and transaction logs), serving as the foundation for incremental backups.

Incremental backup: Back up the incremental data and transaction logs since the last backup.

Full backup: Backup all data of the entire cluster (metadata and all storage shard data).

Partial backup: Only backs up data from a specific shard.

2.4 Recovery Type

Full Restore: Restore the entire cluster

Partial recovery: only recover a specific shard

Time-based Recovery: Restore to a Specific Point in Time (PiTR)

Transaction-based recovery: Recovery based on a specific transaction number. This is a more precise way to specify a point in time, because in a high-load system, thousands of transactions may be committed every second, and the specified point in time may not correspond precisely to a single transaction.

3. Execution Steps

Backup

3.1 Set Backup Strategy

Determine the backup objects, backup type, and prepare the backup target storage.

3.2 Execute Backup

Schedule backups via command line or UI (KlustronDB provides a web interface)

[kunlun@kunlun-test6 util]$ backup --help
Usage of backup:
  -HdfsNameNodeService string
        specify the hdfs name node service, hdfs://ip:port
  -backuptype string
        back up storage node or 'compute' node,default is 'storage' (default "storage")
  -clustername string
        name of the cluster to be backuped
  -coldstoragetype string
        specify the coldback storage type: hdfs .. (default "hdfs")
  -etcfile string
        path to the etc file of the mysql instance to be backuped,
        if port is specified and the related instance is running,
        the tool will determine the etcfile path
  -port string
        the port of mysql or postgresql instance which to be backuped
  -shardname string
        name of the current shard
  -workdir string
        where store the backup data locally for temp use (default "./data")

3.3 Check the backup results and confirm that the backup was successful

Restore

3.4 Set Recovery Policy

Confirm recovery object, backup type, and prepare backup target storage

3.5 Execute Recovery

Restore operation instruction:

[kunlun@kunlun-test6 util]$ restore --help
Usage of restore:
  -HdfsNameNodeService string
        specify the hdfs name node service, hdfs://ip:port
  -enable-globalconsistent
        whether restore the new mysql under global consistent restrict
  -metaclusterconnstr string
        current meta cluster connection string e.g. user:pass@tcp(ip:port)/mysql
  -mysqletcfile string
        etc file of the mysql which to be restored, if port is provied and mysqld is alive ,no need
  -origclustername string
        source cluster name to be restored or backuped
  -origmetaclusterconnstr string
        orig meta cluster connection string e.g. user:pass@tcp(ip:port)/mysql
  -origshardname string
        source shard name to be restored
  -port string
        the port of mysql/postgresql instance which to be restored and needed to be running state
  -restoretime string
        time point the new mysql restore to
  -restoretype string
        restore storage node or 'compute' node,default is 'storage' (default "storage")
  -workdir string
        temporary work path to store the coldback or other type files if needed (default "./data")

3.6 Check the recovery results and confirm that the recovery was successful

4. Cluster Backup and Restore Demonstration

Demonstrate how to perform cluster backup and recovery in the KlustronDB UI interface

Environmental Information:

The cluster that needs to be backed up consists of a compute node, a shard (each shard consists of one primary and two secondary nodes), and a metashard (one primary and two secondary nodes).

Data status before backup:

_

Step 1: Start the backup

Backup operation: Click the backup button on the cluster management interface to start the backup.

Step 2: Check the backup status

根据执行时间, 备份成功后, 可以看到:backupcluster succeed 信息。

Step 3: Restore the cluster

Select the recovery point, and then confirm the recovery.

The cluster has entered recovery mode:

_

After the recovery is completed, the system will create a new cluster in the available resource area and restore the backed-up data.

Restore status:

Recovered cluster:

Enter the recovered cluster computing node and check the recovered data:

After the cluster is restored, the corresponding data is also correctly restored.

END