Klustron Table Redistribution Use Case
Klustron Table Redistribution Use Case
Note:
Unless specified otherwise, any version number mentioned in the document can be replaced by any released version number. For a list of all released versions, refer to: Release_notes
Objective:
This article demonstrates Klustron's table redistribution functionality using the command line tools and XPanel, guiding customers on how to convert regular tables into partitioned tables. We simulated a scenario of redistributing tables under an online application environment, testing the impact of table redistribution on online applications and verifying data integrity.
In real-world business scenarios, as business requirements and table data volumes change, the way the tables were originally created may no longer be appropriate. Take, for example, the sales_order
table mentioned in this article. At the time of its creation, due to modest business volume, it was set up as a non-partitioned table. However, with the company's growth, the number of records in the table has surged. As a result, there's a need to transform it into a partitioned table to facilitate data archiving in the future and to boost application access speed.
Specific environment details are as follows:
Node Type | IP | Port |
---|---|---|
Compute Node | 192.168.40.152 | 47002 |
Shard1 Master Node | 192.168.40.152 | 57003 |
Shard2 Master Node | 192.168.26.153 | 57005 |
XPanel | 192.168.40.151 | 18080 |
01 Environment Setup
Log into the computing node using the PG client and create a user and database.
psql -h 192.168.40.152 -p 47001 postgres
create user kunlun_test with password 'kunlun';
create database test_db with owner kunlun_test encoding utf8 template template0;
\q
psql -h 192.168.40.152 -p 47001 -U kunlun_test test_db
Create the sales_order
table and load data using a stored procedure.
create table sales_order
(
order_number INT NOT NULL AUTO_INCREMENT,
customer_number INT NOT NULL,
product_code INT NOT NULL,
order_date DATETIME NOT NULL,
entry_date DATETIME NOT NULL,
order_amount DECIMAL(18,2) NOT NULL,
primary key(order_number,order_date)
) ;
create or replace procedure generate_order_data()
AS $$
DECLARE
v_customer_number integer;
v_product_code integer;
v_order_date date;
v_amount integer;
start_date date := to_date('2021-01-01','yyyy-mm-dd');
i integer :=1;
BEGIN
while i<=10000 loop
v_customer_number := FLOOR(1+RANDOM()*6);
v_product_code := FLOOR(1+RANDOM()*500);
v_order_date := to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT);
v_amount := FLOOR(1000+RANDOM()*9000);
INSERT INTO sales_order VALUES (i,v_customer_number,v_product_code,v_order_date,v_order_date,v_amount);
commit;
i := i+1;
end loop;
END; $$
LANGUAGE plpgsql;
set statement_timeout=0;
call generate_order_data();
test_db=> select count(*) from sales_order;
count
-------
10000
(1 row)
Create the target table sales_order_new
as a range partitioned table based on order time.
create table sales_order_new
(
order_number INT NOT NULL AUTO_INCREMENT,
customer_number INT NOT NULL,
product_code INT NOT NULL,
order_date DATETIME NOT NULL,
entry_date DATETIME NOT NULL,
order_amount DECIMAL(18,2) NOT NULL,
primary key(order_number,order_date)
) partition by range(order_date);
create table sales_order_p0 partition of sales_order_new
for values from ('2021-01-01') to ('2021-05-01');
create table sales_order_p1 partition of sales_order_new
for values from ('2021-05-01') to ('2021-09-01');
create table sales_order_p2 partition of sales_order_new
for values from ('2021-09-01') to ('2022-01-01');
kinsert.py
to simulate continuous insert operations on the sales_order
table.
02 Prepare a Python script import psycopg2.extras
import time
conn = psycopg2.connect(database='test_db',user='kunlun_test',
password='kunlun',host='192.168.40.152',port='47001')
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
# insert sales_order table
insert_sql = '''INSERT INTO sales_order VALUES (%s,
FLOOR(1+RANDOM()*6), FLOOR(1+RANDOM()*3),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
FLOOR(1000+RANDOM()*9000));
;'''
#print("Press Enter to continue.")
input('Press any key and Enter to continue ~!')
i = 10001
while (i < 20001) :
cursor.execute(insert_sql,[i])
conn.commit()
print("order_number:",i," is inserted.")
i = i+1
cursor.close()
conn.close()
The application simulates inserts starting from order_number 10001
. After the program runs, it will wait for user input before executing the actual insert logic. Once the redistribution task is initiated in the XPanel interface, immediately resume the program's execution.
[klbase@server-0 ~]$ python kinsert.py
Press any key and Enter to continue ~!
03 Conducting Table Redistribution in the XPanel Interface
In the "Cluster List Info" page, click on "Settings".
In the left column, click on “Table Redistribution”.
Select the "Target Table Cluster", "Source Table", "Target Table", and the strategy to delete the source table. If you choose the "Automatic" strategy, the source table will be retained for 7 days by default. If you select "Manual", users will need to delete the source table themselves.
Click on "Submit". Simultaneously, switch to the terminal where the Python program kinsert.py is running and press Enter to continue inserting records.
[klbase@server-0 ~]$ python kinsert.py
Press any key and Enter to continue ~!
order_number: 10001 is inserted.
order_number: 10002 is inserted.
order_number: 10003 is inserted.
order_number: 10004 is inserted.
order_number: 10005 is inserted.
order_number: 10006 is inserted.
order_number: 10007 is inserted.
order_number: 10008 is inserted.
order_number: 10009 is inserted.
order_number: 10010 is inserted.
order_number: 10011 is inserted.
order_number: 10012 is inserted.
order_number: 10013 is inserted.
order_number: 10014 is inserted.
order_number: 10015 is inserted.
order_number: 10016 is inserted.
After a short while, you will encounter an insertion error since the source table sales_order
has been renamed.
order_number: 10281 is inserted.
order_number: 10282 is inserted.
order_number: 10283 is inserted.
order_number: 10284 is inserted.
order_number: 10285 is inserted.
order_number: 10286 is inserted.
order_number: 10287 is inserted.
order_number: 10288 is inserted.
order_number: 10289 is inserted.
order_number: 10290 is inserted.
order_number: 10291 is inserted.
order_number: 10292 is inserted.
order_number: 10293 is inserted.
order_number: 10294 is inserted.
order_number: 10295 is inserted.
order_number: 10296 is inserted.
order_number: 10297 is inserted.
Traceback (most recent call last):
File "kinsert.py", line 22, in <module>
cursor.execute(insert_sql,[i])
File "/usr/local/lib64/python3.6/site-packages/psycopg2/extras.py", line 236, in execute
return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "sales_order" does not exist
LINE 1: INSERT INTO sales_order VALUES (10298,
At this point, the order_number
being inserted by the online application is 10297. When the table redistribution is executed, the sales_order
is renamed, causing the application's insertion to fail.
03 Examining the Target and Source Tables
Returning to the PG client, we inspect the target and source tables. We find that the source table has now become a partitioned table, while the target table has been renamed to __sales_order$$tb_repartition_13 based on the "Delete Source Table" strategy.
The naming convention is: _[SourceTableName]tb_repartition[TaskNumber]. After 7 days, the system will automatically delete the table named __sales_ordertb_repartition_13.
Upon querying the data within sales_order, the data volume aligns perfectly with the insertion point just before the application error occurred. After initiating the table redistribution, the online application still managed to insert 297 records.
test_db=> select count(*) from sales_order;
count
-------
10297
(1 row)