KlustronDB vs Greenplum TPC-H Performance Comparison Test
KlustronDB vs Greenplum TPC-H Performance Comparison Test
Test Overview
Using the TPC-H testing tool, tests were conducted on KlustronDB v1.3.1 and Greenplum Database 7.0.0 versions to obtain performance results. The results show that when the data volume is 100G, KlustronDB has an 8-fold performance advantage over Greenplum; when the data volume is 200G, KlustronDB has a 3.5-fold performance advantage over Greenplum; and when the data volume is 500G, KlustronDB has a 1.5-fold performance advantage over Greenplum.
Test environment
Server specifications configuration
There are three machines in total: 192.168.0.17, 192.168.0.18, 192.168.0.19, and the configuration of each machine is as follows:
- Operating System: CentOS Linux release 8.5.2111
- CPU: AMD Ryzen 9 7950X 16-Core Processor, Threads per core: 2, 32 cores
- Memory: 128GB
- Disk: NVMe 1.8T *1
- Network: Speed: 10000Mb/s
Software version
| Software Name | Software Version |
|---|---|
| KlustronDB | v1.3.1 |
| Greenplum | v7.0.0 |
| TPC-H | v3.0.0 |
Cluster environment
KlustronDB
Since TPC-H is a read-only performance test, this cluster does not have standby nodes deployed, and fullsync is turned off (enable_fullsync=false), and the flush parameters are set to
innodb_flush_log_at_trx_commit=2;
sync_binlog=0;
This will not affect the accuracy and fairness of the test results, nor will it provide any advantageous performance benefits to Klustron. KlustronDB's XPanel component, the Cluster_mgr cluster, and the metashard_cluster are also deployed on these three servers. The performance overhead of these components is very small, and they will not significantly consume server resources or noticeably reduce KlustronDB's TPC-H performance.
| Node Type | IP | PORT |
|---|---|---|
| Compute Node | 192.168.0.19 | 23001 |
| Storage-shard1-Master | 192.168.0.17 | 33501 |
| Storage-shard2-Master | 192.168.0.18 | 33503 |
| Storage-shard3-Master | 192.168.0.19 | 33505 |
| Cluster_mgr | 192.168.0.17/18/19 | 23501 |
| Metadata_cluster | 192.168.0.17/18/19 | 23301 |
| XPanel | 192.168.0.18 | 10024 |
Cluster Description:
- Compute Node: Deploy a compute node and load the Tornado execution engine, which can improve the performance of analytical queries by tens of times, thereby bringing a huge leap in the performance of KlustronDB's TPC-H/TPC-DS benchmarks.
- Storage nodes: 3 shards, each shard has a single primary, and the single primaries of the three shards are distributed across these three machines.
- Management nodes and metashard: The management cluster consists of three machines, forming 3 nodes, with 1 primary and 2 backups.
Greenplum
| Node Type | IP | PORT |
|---|---|---|
| Master | 192.168.0.17 | 54321 |
| Segment1 | 192.168.0.18 | 6000 |
| Segment2 | 192.168.0.19 | 6000 |
Configuration parameters
KlustronDB
Compute Node:
alter system set statement_timeout=6000000;
alter system set mysql_read_timeout=1200;
alter system set mysql_write_timeout=1200;
alter system set lock_timeout=1200000;
alter system set log_min_duration_statement=1200000;
alter system set effective_cache_size = '48GB';
alter system set work_mem = '5GB';
alter system set wal_buffers='64MB';
alter system set extension.aggregate_ignore_work_mem=false;
alter system set enable_nestloop=false;
alter system set extension.max_custom_indexscan_parallelism=16;
alter system set extension.custom_hashjoin_parallelism=32;
Note: A restart of the compute nodes is required after adjustments.
Storage Node:
mysql -h 192.168.0.17 -P 33501 -upgx -ppgx_pwd
mysql -h 192.168.0.18 -P 33503 -upgx -ppgx_pwd
mysql -h 192.168.0.19 -P 33505 -upgx -ppgx_pwd
Set the following parameters separately:
set persist innodb_buffer_pool_size=64*1024*1024*1024;
set persist lock_wait_timeout=1200;
set persist innodb_lock_wait_timeout=1200;
set persist fullsync_timeout=1200000;
set persist enable_fullsync=false;
set persist innodb_flush_log_at_trx_commit=2;
set persist sync_binlog=0;
set persist max_binlog_size=1*1024*1024*1024;
Greenplum
Turn off auto_vacuum to prevent the vacuum process from suddenly interfering and affecting TPC-H performance.
Execute on the Master node
gpconfig -c shared_buffers -v 48GB
gpconfig -c statement_timeout -v 6000000
gpconfig -c lock_timeout -v 1200000
gpconfig -c log_min_duration_statement -v 1200000
gpconfig -c effective_cache_size -v 48GB
gpconfig -c work_mem -v 5GB
gpconfig -c wal_buffers -v 64MB
gpconfig -c autovacuum -v false
gpstop -M fast
gpstart
Test Results
TPC-H-100G
TPC-H data volume 100G, the test results are the shorter time of the two tests, unit: seconds.
| QUERY ID | KlustronDB v1.3.1 | Greenplum v7.0.0 |
|---|---|---|
| Q1 | 4.48 | 248.16 |
| Q2 | 8.05 | 23.57 |
| Q3 | 10.09 | 100.23 |
| Q4 | 5.06 | 156.42 |
| Q5 | 22.33 | 100.03 |
| Q6 | 2.95 | 36.49 |
| Q7 | 12.36 | 62.98 |
| q9 | 29.43 | 129.88 |
| Q10 | 10.61 | 73.09 |
| Q11 | 2.6 | 6.84 |
| Q12 | 6.99 | 71.68 |
| Q13 | 15.73 | 73.93 |
| Q14 | 4.04 | 39.68 |
| Q15 | 7.57 | 96.99 |
| Q16 | 3.76 | 20.25 |
| Q17 | 15.41 | 429.69 |
| Q18 | 39.05 | 320.23 |
| Q19 | 8.37 | 44.28 |
| Q20 | 13.57 | 75.38 |
| Q21 | 47.8 | 203.55 |
| Q22 | 6.29 | 63.47 |
| Total time consumption | 307.9 | 2439.98 |

TPC-H-200G
TPC-H data volume 200G, the test results are the shorter time of the two tests, unit: seconds.
| QUERY ID | KlustronDB v1.3.1 | Greenplum v7.0.0 |
|---|---|---|
| Q1 | 44.39 | 515.72 |
| Q2 | 26.95 | 55.25 |
| Q3 | 55.93 | 205.8 |
| Q4 | 56.83 | 339.53 |
| Q5 | 114.12 | 220.61 |
| Q6 | 45.47 | 83.78 |
| Q7 | 70.03 | 156.95 |
| q9 | 111.07 | 370.88 |
| Q10 | 63.72 | 177.11 |
| Q11 | 14.99 | 19.48 |
| Q12 | 64.27 | 159.33 |
| Q13 | 33.62 | 144.22 |
| Q14 | 53.79 | 98.42 |
| Q15 | 102.97 | 216.4 |
| Q16 | 17.49 | 47.68 |
| Q17 | 96.64 | 911.3 |
| Q18 | 177.66 | 675.87 |
| Q19 | 52.49 | 99.26 |
| Q20 | 78.37 | 163.19 |
| Q21 | 218.36 | 494.77 |
| Q22 | 20.24 | 133.21 |
| Total time consumption | 1631.57 | 5489.93 |

TPC-H-500G
TPC-H data volume 500G, the test results are the shorter time of the two tests, unit: seconds.
| QUERY ID | KlustronDB v1.3.1 | Greenplum v7.0.0 |
|---|---|---|
| Q1 | 88.79 | 515.72 |
| Q2 | 167.71 | 55.25 |
| Q3 | 424.11 | 205.8 |
| Q4 | 162.17 | 339.53 |
| Q5 | 465.63 | 220.61 |
| Q6 | 132.41 | 83.78 |
| Q7 | 209.25 | 156.95 |
| q9 | 717.72 | 370.88 |
| Q10 | 266.68 | 177.11 |
| Q11 | 135.14 | 19.48 |
| Q12 | 128.81 | 159.33 |
| Q13 | 178.35 | 144.22 |
| Q14 | 174.23 | 98.42 |
| Q15 | 330.5 | 216.4 |
| Q16 | 641.42 | 47.68 |
| Q17 | 445.27 | 911.3 |
| Q18 | 4191.07 | 675.87 |
| Q19 | 163.6 | 99.26 |
| Q20 | 610.63 | 163.19 |
| Q21 | ||
| 494.77 | ||
| Q22 | 95.79 | 133.21 |
| Total time consumption | 10197.26 | 16281.12 |

Create Table Statement
KlustronDB
KlustronDB-1.4 Table Creation Syntax
Note that the syntax used in this section applies to Klustron-1.4. For older versions, equivalent syntax must be used to create these tables. The key point is to create the tables PART, SUPPLIER, PARTSUPP, CUSTOMER, ORDERS, and LINEITEM as hash-partitioned tables, with each table having three partitions (the number of shards in the KlustronDB cluster for this test is 3. If the user's KlustronDB cluster has more shards, these tables can be created with more partitions, which will result in higher test performance), and the partitions with the same hash value in each table occupy the same shard. For example, partitions with hash values 0, 1, and 2 are located in shard 1, shard 2, and shard 3, respectively.
This test result applies to both KlustronDB-1.4.1 and KlustronDB-1.3.1, with only minimal performance differences between the two.
create TABLEGROUP tpch10 partitions 3 (
partition tpch10_auto_p0 with(hash=0,shard=1),
partition tpch10_auto_p1 with(hash=1,shard=2),
partition tpch10_auto_p2 with(hash=2,shard=3)
);
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152),
PRIMARY KEY (n_nationkey)) WITH(SHARD=all);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152),
PRIMARY KEY (r_regionkey)) WITH (SHARD=all);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE double NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL,
PRIMARY KEY (p_partkey)) PARTITION BY HASH (p_partkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL double NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL,
PRIMARY KEY (s_suppkey)) PARTITION BY HASH (s_suppkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST double NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL,
PRIMARY KEY (ps_partkey, ps_suppkey)) PARTITION BY HASH (ps_partkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL double NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL,
PRIMARY KEY(c_custkey)) PARTITION BY HASH (C_CUSTKEY) partitions 3 TABLEGROUP tpch10;
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE double NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL,
PRIMARY KEY(o_orderkey)) PARTITION BY HASH (o_orderkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY double NOT NULL,
L_EXTENDEDPRICE double NOT NULL,
L_DISCOUNT double NOT NULL,
L_TAX double NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL,
PRIMARY KEY(l_orderkey, l_linenumber)) PARTITION BY HASH (l_orderkey) partitions 3 TABLEGROUP tpch10;
KlustronDB-1.3 and older versions' table creation syntax
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152),
PRIMARY KEY (n_nationkey)) WITH(SHARD=all);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152),
PRIMARY KEY (R_REGIONKEY)) WITH(SHARD=all);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL,
PRIMARY KEY (P_PARTKEY)) PARTITION BY HASH (P_PARTKEY);
CREATE TABLE PART_0 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PART_1 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PART_2 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL,
PRIMARY KEY (S_SUPPKEY)) PARTITION BY HASH (S_SUPPKEY);
CREATE TABLE SUPPLIER_0 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE SUPPLIER_1 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE SUPPLIER_2 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL,
PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)) PARTITION BY HASH (PS_PARTKEY);
CREATE TABLE PARTSUPP_0 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PARTSUPP_1 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PARTSUPP_2 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL,
PRIMARY KEY(C_CUSTKEY)) PARTITION BY HASH (C_CUSTKEY);
CREATE TABLE CUSTOMER_0 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE CUSTOMER_1 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE CUSTOMER_2 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL,
PRIMARY KEY(O_ORDERKEY)) PARTITION BY HASH (O_ORDERKEY);
CREATE TABLE ORDERS_0 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE ORDERS_1 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE ORDERS_2 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL,
PRIMARY KEY(L_ORDERKEY, L_LINENUMBER)) PARTITION BY HASH (L_ORDERKEY);
CREATE TABLE LINEITEM_0 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE LINEITEM_1 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE LINEITEM_2 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
Grennplum
create table nation ( n_nationkey integer not null,
n_name char(25) not null,
n_regionkey integer not null,
n_comment varchar(152),
primary key (n_nationkey)) ;
create table region ( r_regionkey integer not null,
r_name char(25) not null,
r_comment varchar(152),
primary key (r_regionkey)) ;
create table part ( p_partkey integer not null,
p_name varchar(55) not null,
p_mfgr char(25) not null,
p_brand char(10) not null,
p_type varchar(25) not null,
p_size integer not null,
p_container char(10) not null,
p_retailprice decimal(15,2) not null,
p_comment varchar(23) not null,
primary key (p_partkey));
create table supplier ( s_suppkey integer not null,
s_name char(25) not null,
s_address varchar(40) not null,
s_nationkey integer not null,
s_phone char(15) not null,
s_acctbal decimal(15,2) not null,
s_comment varchar(101) not null,
primary key (s_suppkey));
create table partsupp ( ps_partkey integer not null,
ps_suppkey integer not null,
ps_availqty integer not null,
ps_supplycost decimal(15,2) not null,
ps_comment varchar(199) not null,
primary key (ps_partkey, ps_suppkey));
create table customer ( c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(40) not null,
c_nationkey integer not null,
c_phone char(15) not null,
c_acctbal decimal(15,2) not null,
c_mktsegment char(10) not null,
c_comment varchar(117) not null,
primary key(c_custkey));
create table orders ( o_orderkey integer not null,
o_custkey integer not null,
o_orderstatus char(1) not null,
o_totalprice decimal(15,2) not null,
o_orderdate date not null,
o_orderpriority char(15) not null,
o_clerk char(15) not null,
o_shippriority integer not null,
o_comment varchar(79) not null,
primary key(o_orderkey));
create table lineitem ( l_orderkey integer not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity decimal(15,2) not null,
l_extendedprice decimal(15,2) not null,
l_discount decimal(15,2) not null,
l_tax decimal(15,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null,
primary key(l_orderkey, l_linenumber));
Collect statistical information
Use the following statement to collect statistical information:
KlustronDB and Greenplum both execute the same statement:
analyze nation;
analyze region;
analyze part;
analyze supplier;
analyze partsupp;
analyze customer;
analyze orders;
analyze lineitem;
analyze lineitem(l_shipmode,l_receiptdate,l_returnflag,l_linestatus,l_orderkey,l_suppkey,l_discount,l_quantity,l_partkey);
analyze supplier(s_suppkey,s_nationkey);
analyze part(p_size,p_type,p_partkey);
analyze partsupp(ps_suppkey,ps_supplycost);
analyze nation(n_regionkey,n_nationkey,n_name);
analyze region(r_name,r_regionkey);
analyze customer(c_mktsegment,c_custkey,c_nationkey,c_name);
analyze orders(o_custkey,o_orderkey,o_orderdate,o_orderpriority,o_totalprice);
Query statement
#Q1:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '91' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
LIMIT 1;
#Q2:
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 6
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
LIMIT 100;
#Q3:
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-17'
and l_shipdate > date '1995-03-17'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
LIMIT 10;
#Q4:
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1995-07-01'
and o_orderdate < date '1995-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority
LIMIT 1;
#Q5:
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AFRICA'
and o_orderdate >= date '1997-01-01'
and o_orderdate < date '1997-01-01' + interval '1' year
group by
n_name
order by
revenue desc
LIMIT 1;
#Q6:
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1997-01-01'
and l_shipdate < date '1997-01-01' + interval '1' year
and l_discount between 0.05 - 0.01 and 0.05 + 0.01
and l_quantity < 25
LIMIT 1;
#Q7:
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'IRAN' and n2.n_name = 'PERU')
or (n1.n_name = 'PERU' and n2.n_name = 'IRAN')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year
LIMIT 1;
#Q8:
select
o_year,
sum(case
when nation = 'PERU' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED COPPER'
) as all_nations
group by
o_year
order by
o_year
LIMIT 1;
#Q9:
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%pink%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc
LIMIT 1;
#Q10:
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-05-01'
and o_orderdate < date '1993-05-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
LIMIT 20;
#Q11:
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0001000000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
)
order by
value desc
LIMIT 1;
#Q12:
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and (l_shipmode = 'MAIL' or l_shipmode = 'FOB')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
#Q13:
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc
LIMIT 1;
#Q14:
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1994-10-01'
and l_shipdate < date '1994-10-01' + interval '1' month
LIMIT 1;
#Q15:
create view revenue0 (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1994-04-01'
and l_shipdate < date '1994-04-01' + interval '3' month
group by
l_suppkey;
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey
LIMIT 1;
drop view revenue0;
#Q16:
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#11'
and p_type not like 'STANDARD BRUSHED%'
and p_size in (9, 21, 24, 30, 36, 50, 27, 39)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size
LIMIT 1;
#Q17:
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part,
(SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
p_partkey = l_partkey
and agg_partkey = l_partkey
and p_brand = 'Brand#11'
and p_container = 'SM CAN'
and l_quantity < avg_quantity
LIMIT 1;
#Q18:
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 312
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
LIMIT 100;
#Q19:
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 5 and l_quantity <= 5 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 18 and l_quantity <= 18 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#45'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
LIMIT 1;
#Q20:
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey,
0.5 * sum(l_quantity) AS agg_quantity
from
lineitem
where
l_shipdate >= date '1995-01-01'
and l_shipdate < date '1995-01-01' + interval '1' year
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
)
and ps_availqty > agg_quantity
)
and s_nationkey = n_nationkey
and n_name = 'KENYA'
order by
s_name
LIMIT 1;
#Q21:
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'IRAQ'
group by
s_name
order by
numwait desc,
s_name
LIMIT 100;
#Q22:
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone from 1 for 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone from 1 for 2) in
('24', '27', '34', '23', '29', '15', '13')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone from 1 for 2) in
('24', '27', '34', '23', '29', '15', '13')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode
LIMIT 1;
