Skip to main content

KlustronDB Resource Isolation

KlustronDBAbout 2 min

KlustronDB Resource Isolation

In a multi-tenant environment, effective isolation of resources is an important guarantee for ensuring the availability of user instances. This article will introduce KlustronDB's resource isolation mechanism and its usage.

1. Working Principle

Resource isolation ensures that instances of different users operate only within a preset resource pool, in order to achieve the goal of instances of different users not affecting each other. The system resources mentioned here specifically include CPU, IO, and the use of physical disks.

In KlustronDB-1.1.1, CPU resource isolation has been implemented, while isolation for IO and physical disk space will be implemented in subsequent versions. Below is a brief introduction to the principle of CPU resource isolation.

1.1 Cgroup Mechanism

Linux provides the cgroup mechanism to constrain process resource usage. The full name of cgroup is control group, and each group contains a set of processes and all subsequent child processes. All resource isolation in a computer is conducted on control groups.

Different control groups form an inheritance relationship. A child group integrates the resources of the parent group and redistributes them on top of the parent's resources.

Linux provides a file system interface (/sys/fs/cgroup/**) for setting up cgroup-related facilities, and KlustronDB also uses this interface to implement instance-level resource isolation. Among them, cluster_mgr is responsible for receiving user requests related to resource isolation and forwarding them to the node_mgr on the corresponding physical device, where node_mgr calls the cgroup2kunlun tool to issue the isolation parameters.

Below is an introduction to the key parameters of CPU resource isolation

cpu.cfs_period_us

Specify the CPU resource reset period, in microseconds. Specify a time period, and at the start of each new period, the CPU resource quota for the relevant cgroup will be reallocated.

For example, if you need to set a cgroup's access to a single CPU so that the current cgroup can use it for 0.2 seconds per second, then the relevant parameter configuration is:

cpu.cfs_quota_us=200000 cpu.cfs_period_us=1000000

The value range of cfs_quota_us is from 1000 us to 1 s.

cpu.cfs_quota_us (quota mode)

Specify the time quota for all tasks within a cgroup that can actually use the CPU within a single CPU cycle.

Once a task's time quota is exhausted within the current CPU time cycle, the task will not be allocated CPU resources again for the remaining time of the current cycle, until a new time cycle begins.

cpu.shares (share mode)

The value of this metric is represented by an integer, indicating the proportion of CPU time that the current cgroup can obtain relative to the entire machine. For example, if there are two cgroups, with cpu.shares values of 100 and 200 respectively, then when tasks in both cgroups are CPU busy, the CPU quota of the entire machine they can occupy is 0 and `

There are two points that need to be clarified here:

  1. When only one cgroup is busy and the other cgroups are idle, the currently busy cgroup can consume the entire device.

  2. With the addition of the share values of newly added cgroups, the allocation ratio of existing cgroups may be diluted. Therefore, in actual use, it is best to set an upper limit for the entire resource pool and not allow unlimited increases in the number of cgroup groups, so as to avoid affecting the resource allocation of existing cgroup groups.

Multithreading

  • It is necessary to write the process ID into the cgroup.procs file so that the operating system can track thread creation under that process and thus control resource utilization.
  • When deleting a cgroup control group, it is necessary to write the existing content of cgroup.procs into the global cgroup.procs file. If this is not done, the operating system will report 'device is busy,' which means that all surviving processes need to work under a cgroup group and none can be left out.

In KlustronDB, there are two ways to configure CPU resource isolation for instances.

The first method is to provide the initial CPU resource configuration when purchasing an instance. If not specified, the default is a quota mode configuration of 4 cores.

The second method is to dynamically adjust resources during instance runtime. This can be done through API calls or manually via XPanel. For specific usage, refer to the KlustronDB API documentation and the XPanel User Manual.

END