Skip to main content

Klustron Table Redistribution Use Case

KlustronAbout 4 min

Klustron Table Redistribution Use Case

Note:

Unless specified otherwise, any version number mentioned in the document can be replaced by any released version number. For a list of all released versions, refer to: Release_notes

Objective:

This article demonstrates Klustron's table redistribution functionality using the command line tools and XPanel, guiding customers on how to convert regular tables into partitioned tables. We simulated a scenario of redistributing tables under an online application environment, testing the impact of table redistribution on online applications and verifying data integrity.

In real-world business scenarios, as business requirements and table data volumes change, the way the tables were originally created may no longer be appropriate. Take, for example, the sales_order table mentioned in this article. At the time of its creation, due to modest business volume, it was set up as a non-partitioned table. However, with the company's growth, the number of records in the table has surged. As a result, there's a need to transform it into a partitioned table to facilitate data archiving in the future and to boost application access speed.

Specific environment details are as follows:

Node TypeIPPort
Compute Node192.168.40.15247002
Shard1 Master Node192.168.40.15257003
Shard2 Master Node192.168.26.15357005
XPanel192.168.40.15118080

01 Environment Setup

Log into the computing node using the PG client and create a user and database.

psql -h 192.168.40.152 -p 47001 postgres
create user kunlun_test with password 'kunlun';
create database test_db with owner kunlun_test encoding utf8 template template0;
\q
psql -h 192.168.40.152 -p 47001 -U kunlun_test test_db

Create the sales_order table and load data using a stored procedure.

create table sales_order
(
   order_number         INT NOT NULL AUTO_INCREMENT,
   customer_number      INT NOT NULL,
   product_code         INT NOT NULL,
   order_date           DATETIME NOT NULL,
   entry_date           DATETIME NOT NULL,
   order_amount         DECIMAL(18,2) NOT NULL,
   primary key(order_number,order_date)
)  ;

create or replace procedure generate_order_data()
AS $$
DECLARE
  v_customer_number integer;
  v_product_code integer;
  v_order_date date;
  v_amount integer;
  start_date date := to_date('2021-01-01','yyyy-mm-dd');
  i integer :=1;
BEGIN
	while i<=10000 loop
		v_customer_number := FLOOR(1+RANDOM()*6);
		v_product_code := FLOOR(1+RANDOM()*500);
		v_order_date := to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT);
		v_amount := FLOOR(1000+RANDOM()*9000);
		INSERT INTO sales_order VALUES (i,v_customer_number,v_product_code,v_order_date,v_order_date,v_amount);
		commit;
		i := i+1;
	end loop;
END; $$
LANGUAGE plpgsql;
set statement_timeout=0;
call generate_order_data();
test_db=> select count(*) from sales_order;
 count 
-------
 10000
(1 row)

Create the target table sales_order_new as a range partitioned table based on order time.

create table sales_order_new
(
   order_number         INT NOT NULL AUTO_INCREMENT,
   customer_number      INT NOT NULL,
   product_code         INT NOT NULL,
   order_date           DATETIME NOT NULL,
   entry_date           DATETIME NOT NULL,
   order_amount         DECIMAL(18,2) NOT NULL,
   primary key(order_number,order_date)
) partition by range(order_date);
create table sales_order_p0 partition of sales_order_new
for values from ('2021-01-01') to ('2021-05-01');
create table sales_order_p1 partition of sales_order_new
for values from ('2021-05-01') to ('2021-09-01');
create table sales_order_p2 partition of sales_order_new
for values from ('2021-09-01') to ('2022-01-01');

02 Prepare a Python script kinsert.py to simulate continuous insert operations on the sales_order table.

import psycopg2.extras
import time

conn = psycopg2.connect(database='test_db',user='kunlun_test',
                 password='kunlun',host='192.168.40.152',port='47001')


cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

# insert sales_order table
insert_sql = '''INSERT INTO sales_order VALUES (%s,
FLOOR(1+RANDOM()*6), FLOOR(1+RANDOM()*3),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
FLOOR(1000+RANDOM()*9000));
;'''

#print("Press Enter to continue.")
input('Press any key and Enter to continue ~!')
i = 10001
while (i < 20001) :
  cursor.execute(insert_sql,[i])
  conn.commit()
  print("order_number:",i," is inserted.")
  i = i+1

cursor.close()
conn.close()

The application simulates inserts starting from order_number 10001. After the program runs, it will wait for user input before executing the actual insert logic. Once the redistribution task is initiated in the XPanel interface, immediately resume the program's execution.

[klbase@server-0 ~]$ python kinsert.py 
Press any key and Enter to continue ~!

03 Conducting Table Redistribution in the XPanel Interface

In the "Cluster List Info" page, click on "Settings".

In the left column, click on “Table Redistribution”.

Select the "Target Table Cluster", "Source Table", "Target Table", and the strategy to delete the source table. If you choose the "Automatic" strategy, the source table will be retained for 7 days by default. If you select "Manual", users will need to delete the source table themselves.

Click on "Submit". Simultaneously, switch to the terminal where the Python program kinsert.py is running and press Enter to continue inserting records.

[klbase@server-0 ~]$ python kinsert.py 
Press any key and Enter to continue ~!
order_number: 10001  is inserted.
order_number: 10002  is inserted.
order_number: 10003  is inserted.
order_number: 10004  is inserted.
order_number: 10005  is inserted.
order_number: 10006  is inserted.
order_number: 10007  is inserted.
order_number: 10008  is inserted.
order_number: 10009  is inserted.
order_number: 10010  is inserted.
order_number: 10011  is inserted.
order_number: 10012  is inserted.
order_number: 10013  is inserted.
order_number: 10014  is inserted.
order_number: 10015  is inserted.
order_number: 10016  is inserted.

After a short while, you will encounter an insertion error since the source table sales_order has been renamed.

order_number: 10281  is inserted.
order_number: 10282  is inserted.
order_number: 10283  is inserted.
order_number: 10284  is inserted.
order_number: 10285  is inserted.
order_number: 10286  is inserted.
order_number: 10287  is inserted.
order_number: 10288  is inserted.
order_number: 10289  is inserted.
order_number: 10290  is inserted.
order_number: 10291  is inserted.
order_number: 10292  is inserted.
order_number: 10293  is inserted.
order_number: 10294  is inserted.
order_number: 10295  is inserted.
order_number: 10296  is inserted.
order_number: 10297  is inserted.
Traceback (most recent call last):
  File "kinsert.py", line 22, in <module>
    cursor.execute(insert_sql,[i])
  File "/usr/local/lib64/python3.6/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "sales_order" does not exist
LINE 1: INSERT INTO sales_order VALUES (10298,

At this point, the order_number being inserted by the online application is 10297. When the table redistribution is executed, the sales_order is renamed, causing the application's insertion to fail.

03 Examining the Target and Source Tables

Returning to the PG client, we inspect the target and source tables. We find that the source table has now become a partitioned table, while the target table has been renamed to __sales_order$$tb_repartition_13 based on the "Delete Source Table" strategy.

The naming convention is: _[SourceTableName]tb_repartition[TaskNumber]. After 7 days, the system will automatically delete the table named __sales_ordertb_repartition_13.

Upon querying the data within sales_order, the data volume aligns perfectly with the insertion point just before the application error occurred. After initiating the table redistribution, the online application still managed to insert 297 records.

test_db=> select count(*) from sales_order;
 count 
-------
 10297
(1 row)

END