TPCH-Klustron vs Greenplum
TPCH-Klustron vs Greenplum
测试概况
通过TPC-H测试工具,分别对Klustron v1.3.1和Greenplum Database 7.0.0版本进行测试,获取性能结果,结果显示在数据量为100G时,Klustron相对Greenplum 有8倍的性能提升;数据量为200G时,Klustron相对Greenplum 有3.5倍的性能提升;数据量为500G时,Klustron相对Greenplum 有1.5倍的性能提升。
测试环境
服务器规格配置
共三台机器:192.168.0.17、192.168.0.18、192.168.0.19,每台机器的配置如下:
- 操作系统:CentOS Linux release 8.5.2111
- CPU:AMD Ryzen 9 7950X 16-Core Processor,hread(s) per core: 2,32核
- 内存:128GB
- 磁盘:nvme 1.8T *1
- 网络:Speed: 10000Mb/s
软件版本
软件名称 | 软件版本 |
---|---|
Klustron | v1.3.1 |
Greenplum | v7.0.0 |
TPC-H | v3.0.0 |
集群环境
Klustron
由于TPC-H是只读的性能测试,因此此集群并未部署备节点,并且关掉了fullsync(enable_fullsync=false),并且刷盘参数设置为
innodb_flush_log_at_trx_commit=2;
sync_binlog=0;
这并不会影响测试结果的准确性和公平性,也不会给Klustron的性能带来有利的优势。Klustron的XPanel组件,Cluster_mgr集群和metashard_cluster也都部署在这3台服务器上面,这些组件的性能开销很小,不会显著消耗服务器资源,也不会显著降低Klustron的TPC-H性能。
节点类型 | IP | PORT |
---|---|---|
计算节点 | 192.168.0.19 | 23001 |
Storage-shard1-Master | 192.168.0.17 | 33501 |
Storage-shard2-Master | 192.168.0.18 | 33503 |
Storage-shard3-Master | 192.168.0.19 | 33505 |
Cluster_mgr | 192.168.0.17/18/19 | 23501 |
Metedata_cluster | 192.168.0.17/18/19 | 23301 |
XPanel | 192.168.0.18 | 10024 |
集群说明:
- 计算节点:部署一个计算节点。
- 存储节点:3个shard,每一个shard为单主,三个shard的单主分别分布在这三台机器上。
- 管理节点和元数据集群:管理集群有三台机器组成,为3个节点,1主两备。
Greenplum
节点类型 | IP | PORT |
---|---|---|
Master | 192.168.0.17 | 54321 |
Segment1 | 192.168.0.18 | 6000 |
Segment2 | 192.168.0.19 | 6000 |
配置参数
Klustron
计算节点:
alter system set statement_timeout=6000000;
alter system set mysql_read_timeout=1200;
alter system set mysql_write_timeout=1200;
alter system set lock_timeout=1200000;
alter system set log_min_duration_statement=1200000;
alter system set effective_cache_size = '48GB';
alter system set work_mem = '5GB';
alter system set wal_buffers='64MB';
alter system set extension.aggregate_ignore_work_mem=false;
alter system set enable_nestloop=false;
alter system set extension.max_custom_indexscan_parallelism=16;
alter system set extension.custom_hashjoin_parallelism=32;
注意:调整后需重启计算节点。
存储节点:
mysql -h 192.168.0.17 -P 33501 -upgx -ppgx_pwd
mysql -h 192.168.0.18 -P 33503 -upgx -ppgx_pwd
mysql -h 192.168.0.19 -P 33505 -upgx -ppgx_pwd
分别设置如下参数:
set persist innodb_buffer_pool_size=64*1024*1024*1024;
set persist lock_wait_timeout=1200;
set persist innodb_lock_wait_timeout=1200;
set persist fullsync_timeout=1200000;
set persist enable_fullsync=false;
set persist innodb_flush_log_at_trx_commit=2;
set persist sync_binlog=0;
set persist max_binlog_size=1*1024*1024*1024;
Greenplum
关闭auto_vacuum防止vacuum进程突然介入影响TPC-H性能。
在Master节点执行
gpconfig -c shared_buffers -v 48GB
gpconfig -c statement_timeout -v 6000000
gpconfig -c lock_timeout -v 1200000
gpconfig -c log_min_duration_statement -v 1200000
gpconfig -c effective_cache_size -v 48GB
gpconfig -c work_mem -v 5GB
gpconfig -c wal_buffers -v 64MB
gpconfig -c autovacuum -v false
gpstop -M fast
gpstart
测试结果
TPC-H-100G
TPC-H 数据量100G,测试结果均为测试两次择优耗时较短的一次,单位:秒。
QUERY ID | Klustron v1.3.1 | Greenplum v7.0.0 |
---|---|---|
Q1 | 4.48 | 248.16 |
Q2 | 8.05 | 23.57 |
Q3 | 10.09 | 100.23 |
Q4 | 5.06 | 156.42 |
Q5 | 22.33 | 100.03 |
Q6 | 2.95 | 36.49 |
Q7 | 12.36 | 62.98 |
Q8 | 31.36 | 63.17 |
Q9 | 29.43 | 129.88 |
Q10 | 10.61 | 73.09 |
Q11 | 2.6 | 6.84 |
Q12 | 6.99 | 71.68 |
Q13 | 15.73 | 73.93 |
Q14 | 4.04 | 39.68 |
Q15 | 7.57 | 96.99 |
Q16 | 3.76 | 20.25 |
Q17 | 15.41 | 429.69 |
Q18 | 39.05 | 320.23 |
Q19 | 8.37 | 44.28 |
Q20 | 13.57 | 75.38 |
Q21 | 47.8 | 203.55 |
Q22 | 6.29 | 63.47 |
总耗时 | 307.9 | 2439.98 |
TPC-H-200G
TPC-H 数据量200G,测试结果均为测试两次择优耗时较短的一次,单位:秒。
QUERY ID | Klustron v1.3.1 | Greenplum v7.0.0 |
---|---|---|
Q1 | 44.39 | 515.72 |
Q2 | 26.95 | 55.25 |
Q3 | 55.93 | 205.8 |
Q4 | 56.83 | 339.53 |
Q5 | 114.12 | 220.61 |
Q6 | 45.47 | 83.78 |
Q7 | 70.03 | 156.95 |
Q8 | 112.17 | 201.17 |
Q9 | 111.07 | 370.88 |
Q10 | 63.72 | 177.11 |
Q11 | 14.99 | 19.48 |
Q12 | 64.27 | 159.33 |
Q13 | 33.62 | 144.22 |
Q14 | 53.79 | 98.42 |
Q15 | 102.97 | 216.4 |
Q16 | 17.49 | 47.68 |
Q17 | 96.64 | 911.3 |
Q18 | 177.66 | 675.87 |
Q19 | 52.49 | 99.26 |
Q20 | 78.37 | 163.19 |
Q21 | 218.36 | 494.77 |
Q22 | 20.24 | 133.21 |
总耗时 | 1631.57 | 5489.93 |
TPC-H-500G
TPC-H 数据量500G,测试结果均为测试两次择优耗时较短的一次,单位:秒。
QUERY ID | Klustron v1.3.1 | Greenplum v7.0.0 |
---|---|---|
Q1 | 88.79 | 515.72 |
Q2 | 167.71 | 55.25 |
Q3 | 424.11 | 205.8 |
Q4 | 162.17 | 339.53 |
Q5 | 465.63 | 220.61 |
Q6 | 132.41 | 83.78 |
Q7 | 209.25 | 156.95 |
Q8 | 467.98 | 201.17 |
Q9 | 717.72 | 370.88 |
Q10 | 266.68 | 177.11 |
Q11 | 135.14 | 19.48 |
Q12 | 128.81 | 159.33 |
Q13 | 178.35 | 144.22 |
Q14 | 174.23 | 98.42 |
Q15 | 330.5 | 216.4 |
Q16 | 641.42 | 47.68 |
Q17 | 445.27 | 911.3 |
Q18 | 4191.07 | 675.87 |
Q19 | 163.6 | 99.26 |
Q20 | 610.63 | 163.19 |
Q21 | 494.77 | |
Q22 | 95.79 | 133.21 |
总耗时 | 10197.26 | 16281.12 |
建表语句
Klustron
Klustron-1.4 建表语法
注意本节使用的语法适用于Klustron-1.4,对于更老的版本,需要使用等价的语法来创建这些数据表,要点是要把 PART, SUPPLIER, PARTSUPP , CUSTOMER , ORDERS, LINEITEM 这几张表,都创建为hash分区表,并且每个表都有3个分区(本次测试的Klustron集群的shard数量为3。如果用户的Klustron集群的shard数量更多那么可以把这些表都创建由更多个分区,这样测试性能会更高), 并且各个表的具有相同hash值的分区独占同一个shard,例如hash值为0、1、2的分区分别都位于shard 1 、 shard 2 、 shard 3中.
create TABLEGROUP tpch10 partitions 3 (
partition tpch10_auto_p0 with(hash=0,shard=1),
partition tpch10_auto_p1 with(hash=1,shard=2),
partition tpch10_auto_p2 with(hash=2,shard=3)
);
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152),
PRIMARY KEY (n_nationkey)) WITH(SHARD=all);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152),
PRIMARY KEY (r_regionkey)) WITH (SHARD=all);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE double NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL,
PRIMARY KEY (p_partkey)) PARTITION BY HASH (p_partkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL double NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL,
PRIMARY KEY (s_suppkey)) PARTITION BY HASH (s_suppkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST double NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL,
PRIMARY KEY (ps_partkey, ps_suppkey)) PARTITION BY HASH (ps_partkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL double NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL,
PRIMARY KEY(c_custkey)) PARTITION BY HASH (C_CUSTKEY) partitions 3 TABLEGROUP tpch10;
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE double NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL,
PRIMARY KEY(o_orderkey)) PARTITION BY HASH (o_orderkey) partitions 3 TABLEGROUP tpch10;
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY double NOT NULL,
L_EXTENDEDPRICE double NOT NULL,
L_DISCOUNT double NOT NULL,
L_TAX double NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL,
PRIMARY KEY(l_orderkey, l_linenumber)) PARTITION BY HASH (l_orderkey) partitions 3 TABLEGROUP tpch10;
Klustron-1.3以及更老版本的建表语法
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152),
PRIMARY KEY (n_nationkey)) WITH(SHARD=all);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152),
PRIMARY KEY (R_REGIONKEY)) WITH(SHARD=all);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL,
PRIMARY KEY (P_PARTKEY)) PARTITION BY HASH (P_PARTKEY);
CREATE TABLE PART_0 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PART_1 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PART_2 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL,
PRIMARY KEY (S_SUPPKEY)) PARTITION BY HASH (S_SUPPKEY);
CREATE TABLE SUPPLIER_0 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE SUPPLIER_1 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE SUPPLIER_2 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL,
PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)) PARTITION BY HASH (PS_PARTKEY);
CREATE TABLE PARTSUPP_0 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PARTSUPP_1 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PARTSUPP_2 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL,
PRIMARY KEY(C_CUSTKEY)) PARTITION BY HASH (C_CUSTKEY);
CREATE TABLE CUSTOMER_0 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE CUSTOMER_1 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE CUSTOMER_2 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL,
PRIMARY KEY(O_ORDERKEY)) PARTITION BY HASH (O_ORDERKEY);
CREATE TABLE ORDERS_0 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE ORDERS_1 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE ORDERS_2 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL,
PRIMARY KEY(L_ORDERKEY, L_LINENUMBER)) PARTITION BY HASH (L_ORDERKEY);
CREATE TABLE LINEITEM_0 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE LINEITEM_1 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE LINEITEM_2 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);
Grennplum
create table nation ( n_nationkey integer not null,
n_name char(25) not null,
n_regionkey integer not null,
n_comment varchar(152),
primary key (n_nationkey)) ;
create table region ( r_regionkey integer not null,
r_name char(25) not null,
r_comment varchar(152),
primary key (r_regionkey)) ;
create table part ( p_partkey integer not null,
p_name varchar(55) not null,
p_mfgr char(25) not null,
p_brand char(10) not null,
p_type varchar(25) not null,
p_size integer not null,
p_container char(10) not null,
p_retailprice decimal(15,2) not null,
p_comment varchar(23) not null,
primary key (p_partkey));
create table supplier ( s_suppkey integer not null,
s_name char(25) not null,
s_address varchar(40) not null,
s_nationkey integer not null,
s_phone char(15) not null,
s_acctbal decimal(15,2) not null,
s_comment varchar(101) not null,
primary key (s_suppkey));
create table partsupp ( ps_partkey integer not null,
ps_suppkey integer not null,
ps_availqty integer not null,
ps_supplycost decimal(15,2) not null,
ps_comment varchar(199) not null,
primary key (ps_partkey, ps_suppkey));
create table customer ( c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(40) not null,
c_nationkey integer not null,
c_phone char(15) not null,
c_acctbal decimal(15,2) not null,
c_mktsegment char(10) not null,
c_comment varchar(117) not null,
primary key(c_custkey));
create table orders ( o_orderkey integer not null,
o_custkey integer not null,
o_orderstatus char(1) not null,
o_totalprice decimal(15,2) not null,
o_orderdate date not null,
o_orderpriority char(15) not null,
o_clerk char(15) not null,
o_shippriority integer not null,
o_comment varchar(79) not null,
primary key(o_orderkey));
create table lineitem ( l_orderkey integer not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity decimal(15,2) not null,
l_extendedprice decimal(15,2) not null,
l_discount decimal(15,2) not null,
l_tax decimal(15,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null,
primary key(l_orderkey, l_linenumber));
收集统计信息
采用如下语句收集统计信息:
Klustron和Greenplum都执行相同的语句:
analyze nation;
analyze region;
analyze part;
analyze supplier;
analyze partsupp;
analyze customer;
analyze orders;
analyze lineitem;
analyze lineitem(l_shipmode,l_receiptdate,l_returnflag,l_linestatus,l_orderkey,l_suppkey,l_discount,l_quantity,l_partkey);
analyze supplier(s_suppkey,s_nationkey);
analyze part(p_size,p_type,p_partkey);
analyze partsupp(ps_suppkey,ps_supplycost);
analyze nation(n_regionkey,n_nationkey,n_name);
analyze region(r_name,r_regionkey);
analyze customer(c_mktsegment,c_custkey,c_nationkey,c_name);
analyze orders(o_custkey,o_orderkey,o_orderdate,o_orderpriority,o_totalprice);
查询语句
#Q1:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '91' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
LIMIT 1;
#Q2:
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 6
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
LIMIT 100;
#Q3:
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-17'
and l_shipdate > date '1995-03-17'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
LIMIT 10;
#Q4:
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1995-07-01'
and o_orderdate < date '1995-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority
LIMIT 1;
#Q5:
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AFRICA'
and o_orderdate >= date '1997-01-01'
and o_orderdate < date '1997-01-01' + interval '1' year
group by
n_name
order by
revenue desc
LIMIT 1;
#Q6:
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1997-01-01'
and l_shipdate < date '1997-01-01' + interval '1' year
and l_discount between 0.05 - 0.01 and 0.05 + 0.01
and l_quantity < 25
LIMIT 1;
#Q7:
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'IRAN' and n2.n_name = 'PERU')
or (n1.n_name = 'PERU' and n2.n_name = 'IRAN')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year
LIMIT 1;
#Q8:
select
o_year,
sum(case
when nation = 'PERU' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED COPPER'
) as all_nations
group by
o_year
order by
o_year
LIMIT 1;
#Q9:
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%pink%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc
LIMIT 1;
#Q10:
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-05-01'
and o_orderdate < date '1993-05-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
LIMIT 20;
#Q11:
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0001000000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
)
order by
value desc
LIMIT 1;
#Q12:
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and (l_shipmode = 'MAIL' or l_shipmode = 'FOB')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
#Q13:
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc
LIMIT 1;
#Q14:
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1994-10-01'
and l_shipdate < date '1994-10-01' + interval '1' month
LIMIT 1;
#Q15:
create view revenue0 (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1994-04-01'
and l_shipdate < date '1994-04-01' + interval '3' month
group by
l_suppkey;
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey
LIMIT 1;
drop view revenue0;
#Q16:
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#11'
and p_type not like 'STANDARD BRUSHED%'
and p_size in (9, 21, 24, 30, 36, 50, 27, 39)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size
LIMIT 1;
#Q17:
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part,
(SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
p_partkey = l_partkey
and agg_partkey = l_partkey
and p_brand = 'Brand#11'
and p_container = 'SM CAN'
and l_quantity < avg_quantity
LIMIT 1;
#Q18:
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 312
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
LIMIT 100;
#Q19:
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 5 and l_quantity <= 5 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 18 and l_quantity <= 18 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#45'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
LIMIT 1;
#Q20:
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey,
0.5 * sum(l_quantity) AS agg_quantity
from
lineitem
where
l_shipdate >= date '1995-01-01'
and l_shipdate < date '1995-01-01' + interval '1' year
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
)
and ps_availqty > agg_quantity
)
and s_nationkey = n_nationkey
and n_name = 'KENYA'
order by
s_name
LIMIT 1;
#Q21:
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'IRAQ'
group by
s_name
order by
numwait desc,
s_name
LIMIT 100;
#Q22:
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone from 1 for 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone from 1 for 2) in
('24', '27', '34', '23', '29', '15', '13')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone from 1 for 2) in
('24', '27', '34', '23', '29', '15', '13')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode
LIMIT 1;