跳至主要內容

TPCH-Klustron vs Greenplum

Klustron大约 14 分钟

TPCH-Klustron vs Greenplum

测试概况

通过TPC-H测试工具,分别对Klustron v1.3.1和Greenplum Database 7.0.0版本进行测试,获取性能结果,结果显示在数据量为100G时,Klustron相对Greenplum 有8倍的性能提升;数据量为200G时,Klustron相对Greenplum 有3.5倍的性能提升;数据量为500G时,Klustron相对Greenplum 有1.5倍的性能提升。

测试环境

服务器规格配置

共三台机器:192.168.0.17、192.168.0.18、192.168.0.19,每台机器的配置如下:

  • 操作系统:CentOS Linux release 8.5.2111
  • CPU:AMD Ryzen 9 7950X 16-Core Processor,hread(s) per core: 2,32核
  • 内存:128GB
  • 磁盘:nvme 1.8T *1
  • 网络:Speed: 10000Mb/s

软件版本

软件名称软件版本
Klustronv1.3.1
Greenplumv7.0.0
TPC-Hv3.0.0

集群环境

Klustron

由于TPC-H是只读的性能测试,因此此集群并未部署备节点,并且关掉了fullsync(enable_fullsync=false),并且刷盘参数设置为

innodb_flush_log_at_trx_commit=2;
sync_binlog=0;

这并不会影响测试结果的准确性和公平性,也不会给Klustron的性能带来有利的优势。Klustron的XPanel组件,Cluster_mgr集群和metashard_cluster也都部署在这3台服务器上面,这些组件的性能开销很小,不会显著消耗服务器资源,也不会显著降低Klustron的TPC-H性能。

节点类型IPPORT
计算节点192.168.0.1923001
Storage-shard1-Master192.168.0.1733501
Storage-shard2-Master192.168.0.1833503
Storage-shard3-Master192.168.0.1933505
Cluster_mgr192.168.0.17/18/1923501
Metedata_cluster192.168.0.17/18/1923301
XPanel192.168.0.1810024

集群说明:

  • 计算节点:部署一个计算节点。
  • 存储节点:3个shard,每一个shard为单主,三个shard的单主分别分布在这三台机器上。
  • 管理节点和元数据集群:管理集群有三台机器组成,为3个节点,1主两备。

Greenplum

节点类型IPPORT
Master192.168.0.1754321
Segment1192.168.0.186000
Segment2192.168.0.196000

配置参数

Klustron

计算节点:

alter system set statement_timeout=6000000;
alter system set mysql_read_timeout=1200;
alter system set mysql_write_timeout=1200;
alter system set lock_timeout=1200000;
alter system set log_min_duration_statement=1200000;
alter system set effective_cache_size = '48GB';
alter system set work_mem  = '5GB';
alter system set wal_buffers='64MB';
alter system set extension.aggregate_ignore_work_mem=false;
alter system set enable_nestloop=false;
alter system set extension.max_custom_indexscan_parallelism=16;
alter system set extension.custom_hashjoin_parallelism=32;

注意:调整后需重启计算节点。

存储节点:

mysql -h 192.168.0.17 -P 33501 -upgx -ppgx_pwd
mysql -h 192.168.0.18 -P 33503 -upgx -ppgx_pwd
mysql -h 192.168.0.19 -P 33505 -upgx -ppgx_pwd

分别设置如下参数:

set persist innodb_buffer_pool_size=64*1024*1024*1024;
set persist lock_wait_timeout=1200;
set persist innodb_lock_wait_timeout=1200;    
set persist fullsync_timeout=1200000; 
set persist enable_fullsync=false;
set persist innodb_flush_log_at_trx_commit=2;
set persist sync_binlog=0;
set persist max_binlog_size=1*1024*1024*1024;

Greenplum

关闭auto_vacuum防止vacuum进程突然介入影响TPC-H性能。

在Master节点执行

gpconfig -c shared_buffers -v 48GB
gpconfig -c statement_timeout -v 6000000
gpconfig -c lock_timeout -v 1200000
gpconfig -c log_min_duration_statement -v 1200000
gpconfig -c effective_cache_size -v 48GB
gpconfig -c work_mem -v 5GB
gpconfig -c wal_buffers -v 64MB
gpconfig -c autovacuum -v false
gpstop -M fast
gpstart

测试结果

TPC-H-100G

TPC-H 数据量100G,测试结果均为测试两次择优耗时较短的一次,单位:秒。

QUERY IDKlustron v1.3.1Greenplum v7.0.0
Q14.48248.16
Q28.0523.57
Q310.09100.23
Q45.06156.42
Q522.33100.03
Q62.9536.49
Q712.3662.98
Q831.3663.17
Q929.43129.88
Q1010.6173.09
Q112.66.84
Q126.9971.68
Q1315.7373.93
Q144.0439.68
Q157.5796.99
Q163.7620.25
Q1715.41429.69
Q1839.05320.23
Q198.3744.28
Q2013.5775.38
Q2147.8203.55
Q226.2963.47
总耗时307.92439.98

TPC-H-200G

TPC-H 数据量200G,测试结果均为测试两次择优耗时较短的一次,单位:秒。

QUERY IDKlustron v1.3.1Greenplum v7.0.0
Q144.39515.72
Q226.9555.25
Q355.93205.8
Q456.83339.53
Q5114.12220.61
Q645.4783.78
Q770.03156.95
Q8112.17201.17
Q9111.07370.88
Q1063.72177.11
Q1114.9919.48
Q1264.27159.33
Q1333.62144.22
Q1453.7998.42
Q15102.97216.4
Q1617.4947.68
Q1796.64911.3
Q18177.66675.87
Q1952.4999.26
Q2078.37163.19
Q21218.36494.77
Q2220.24133.21
总耗时1631.575489.93

TPC-H-500G

TPC-H 数据量500G,测试结果均为测试两次择优耗时较短的一次,单位:秒。

QUERY IDKlustron v1.3.1Greenplum v7.0.0
Q188.79515.72
Q2167.7155.25
Q3424.11205.8
Q4162.17339.53
Q5465.63220.61
Q6132.4183.78
Q7209.25156.95
Q8467.98201.17
Q9717.72370.88
Q10266.68177.11
Q11135.1419.48
Q12128.81159.33
Q13178.35144.22
Q14174.2398.42
Q15330.5216.4
Q16641.4247.68
Q17445.27911.3
Q184191.07675.87
Q19163.699.26
Q20610.63163.19
Q21494.77
Q2295.79133.21
总耗时10197.2616281.12

建表语句

Klustron

Klustron-1.4 建表语法

注意本节使用的语法适用于Klustron-1.4,对于更老的版本,需要使用等价的语法来创建这些数据表,要点是要把 PART, SUPPLIER, PARTSUPP , CUSTOMER , ORDERS, LINEITEM 这几张表,都创建为hash分区表,并且每个表都有3个分区(本次测试的Klustron集群的shard数量为3。如果用户的Klustron集群的shard数量更多那么可以把这些表都创建由更多个分区,这样测试性能会更高), 并且各个表的具有相同hash值的分区独占同一个shard,例如hash值为0、1、2的分区分别都位于shard 1 、 shard 2 、 shard 3中.

create TABLEGROUP tpch10 partitions 3 (
        partition tpch10_auto_p0 with(hash=0,shard=1),
        partition tpch10_auto_p1 with(hash=1,shard=2),
        partition tpch10_auto_p2 with(hash=2,shard=3)
);

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
			    PRIMARY KEY (n_nationkey)) WITH(SHARD=all);


CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
			    PRIMARY KEY (r_regionkey)) WITH (SHARD=all);

CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE double NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
			  PRIMARY KEY (p_partkey)) PARTITION BY HASH (p_partkey) partitions 3 TABLEGROUP tpch10;



CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     double NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
			     PRIMARY KEY (s_suppkey)) PARTITION BY HASH (s_suppkey) partitions 3 TABLEGROUP tpch10;




CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  double  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
			     PRIMARY KEY (ps_partkey, ps_suppkey)) PARTITION BY HASH (ps_partkey) partitions 3 TABLEGROUP tpch10;




CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     double NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
			     PRIMARY KEY(c_custkey)) PARTITION BY HASH (C_CUSTKEY) partitions 3 TABLEGROUP tpch10;


CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     double NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
			   PRIMARY KEY(o_orderkey)) PARTITION BY HASH (o_orderkey) partitions 3 TABLEGROUP tpch10;



CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    double NOT NULL,
                             L_EXTENDEDPRICE  double NOT NULL,
                             L_DISCOUNT    double NOT NULL,
                             L_TAX         double NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
			     PRIMARY KEY(l_orderkey, l_linenumber)) PARTITION BY HASH (l_orderkey) partitions 3 TABLEGROUP tpch10;

Klustron-1.3以及更老版本的建表语法

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
	PRIMARY KEY (n_nationkey)) WITH(SHARD=all);

CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
	PRIMARY KEY (R_REGIONKEY)) WITH(SHARD=all);

CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
	PRIMARY KEY (P_PARTKEY)) PARTITION BY HASH (P_PARTKEY);
CREATE TABLE PART_0 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PART_1 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PART_2 PARTITION OF PART FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
	PRIMARY KEY (S_SUPPKEY)) PARTITION BY HASH (S_SUPPKEY);
CREATE TABLE SUPPLIER_0 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE SUPPLIER_1 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE SUPPLIER_2 PARTITION OF SUPPLIER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
	PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)) PARTITION BY HASH (PS_PARTKEY);
CREATE TABLE PARTSUPP_0 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE PARTSUPP_1 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE PARTSUPP_2 PARTITION OF PARTSUPP FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
	PRIMARY KEY(C_CUSTKEY)) PARTITION BY HASH (C_CUSTKEY);
CREATE TABLE CUSTOMER_0 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE CUSTOMER_1 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE CUSTOMER_2 PARTITION OF CUSTOMER FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
	PRIMARY KEY(O_ORDERKEY)) PARTITION BY HASH (O_ORDERKEY);
CREATE TABLE ORDERS_0 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE ORDERS_1 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE ORDERS_2 PARTITION OF ORDERS FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
	PRIMARY KEY(L_ORDERKEY, L_LINENUMBER)) PARTITION BY HASH (L_ORDERKEY);
CREATE TABLE LINEITEM_0 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 0) WITH (shard = 1);
CREATE TABLE LINEITEM_1 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 1) WITH (shard = 2);
CREATE TABLE LINEITEM_2 PARTITION OF LINEITEM FOR VALUES WITH (MODULUS 3, REMAINDER 2) WITH (shard = 3);

Grennplum

create table nation  ( n_nationkey  integer not null,
                            n_name       char(25) not null,
                            n_regionkey  integer not null,
                            n_comment    varchar(152),
			    primary key (n_nationkey)) ;

create table region  ( r_regionkey  integer not null,
                            r_name       char(25) not null,
                            r_comment    varchar(152),
			    primary key (r_regionkey)) ;

create table part  ( p_partkey     integer not null,
                          p_name        varchar(55) not null,
                          p_mfgr        char(25) not null,
                          p_brand       char(10) not null,
                          p_type        varchar(25) not null,
                          p_size        integer not null,
                          p_container   char(10) not null,
                          p_retailprice decimal(15,2) not null,
                          p_comment     varchar(23) not null,
			  primary key (p_partkey));

create table supplier ( s_suppkey     integer not null,
                             s_name        char(25) not null,
                             s_address     varchar(40) not null,
                             s_nationkey   integer not null,
                             s_phone       char(15) not null,
                             s_acctbal     decimal(15,2) not null,
                             s_comment     varchar(101) not null,
			     primary key (s_suppkey));

create table partsupp ( ps_partkey     integer not null,
                             ps_suppkey     integer not null,
                             ps_availqty    integer not null,
                             ps_supplycost  decimal(15,2)  not null,
                             ps_comment     varchar(199) not null,
			     primary key (ps_partkey, ps_suppkey));

create table customer ( c_custkey     integer not null,
                             c_name        varchar(25) not null,
                             c_address     varchar(40) not null,
                             c_nationkey   integer not null,
                             c_phone       char(15) not null,
                             c_acctbal     decimal(15,2) not null,
                             c_mktsegment  char(10) not null,
                             c_comment     varchar(117) not null,
			     primary key(c_custkey));

create table orders  ( o_orderkey       integer not null,
                           o_custkey        integer not null,
                           o_orderstatus    char(1) not null,
                           o_totalprice     decimal(15,2) not null,
                           o_orderdate      date not null,
                           o_orderpriority  char(15) not null,  
                           o_clerk          char(15) not null, 
                           o_shippriority   integer not null,
                           o_comment        varchar(79) not null,
			   primary key(o_orderkey));

create table lineitem ( l_orderkey    integer not null,
                             l_partkey     integer not null,
                             l_suppkey     integer not null,
                             l_linenumber  integer not null,
                             l_quantity    decimal(15,2) not null,
                             l_extendedprice  decimal(15,2) not null,
                             l_discount    decimal(15,2) not null,
                             l_tax         decimal(15,2) not null,
                             l_returnflag  char(1) not null,
                             l_linestatus  char(1) not null,
                             l_shipdate    date not null,
                             l_commitdate  date not null,
                             l_receiptdate date not null,
                             l_shipinstruct char(25) not null,
                             l_shipmode     char(10) not null,
                             l_comment      varchar(44) not null,
			     primary key(l_orderkey, l_linenumber));

收集统计信息

采用如下语句收集统计信息:

Klustron和Greenplum都执行相同的语句:

analyze nation;
analyze region;
analyze part;
analyze supplier;
analyze partsupp;
analyze customer;
analyze orders;
analyze lineitem;
analyze lineitem(l_shipmode,l_receiptdate,l_returnflag,l_linestatus,l_orderkey,l_suppkey,l_discount,l_quantity,l_partkey);
analyze supplier(s_suppkey,s_nationkey);
analyze part(p_size,p_type,p_partkey);
analyze partsupp(ps_suppkey,ps_supplycost);
analyze nation(n_regionkey,n_nationkey,n_name);
analyze region(r_name,r_regionkey);
analyze customer(c_mktsegment,c_custkey,c_nationkey,c_name);
analyze orders(o_custkey,o_orderkey,o_orderdate,o_orderpriority,o_totalprice);

查询语句

#Q1:
select
	l_returnflag,
	l_linestatus,
	sum(l_quantity) as sum_qty,
	sum(l_extendedprice) as sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
	avg(l_quantity) as avg_qty,
	avg(l_extendedprice) as avg_price,
	avg(l_discount) as avg_disc,
	count(*) as count_order
from
	lineitem
where
	l_shipdate <= date '1998-12-01' - interval '91' day
group by
	l_returnflag,
	l_linestatus
order by
	l_returnflag,
	l_linestatus
LIMIT 1;
 
 
 
#Q2:
select
	s_acctbal,
	s_name,
	n_name,
	p_partkey,
	p_mfgr,
	s_address,
	s_phone,
	s_comment
from
	part,
	supplier,
	partsupp,
	nation,
	region
where
	p_partkey = ps_partkey
	and s_suppkey = ps_suppkey
	and p_size = 6
	and p_type like '%BRASS'
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'ASIA'
	and ps_supplycost = (
		select
			min(ps_supplycost)
		from
			partsupp,
			supplier,
			nation,
			region
		where
			p_partkey = ps_partkey
			and s_suppkey = ps_suppkey
			and s_nationkey = n_nationkey
			and n_regionkey = r_regionkey
			and r_name = 'ASIA'
	)
order by
	s_acctbal desc,
	n_name,
	s_name,
	p_partkey
LIMIT 100;
 
 
 
#Q3:
select
	l_orderkey,
	sum(l_extendedprice * (1 - l_discount)) as revenue,
	o_orderdate,
	o_shippriority
from
	customer,
	orders,
	lineitem
where
	c_mktsegment = 'BUILDING'
	and c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and o_orderdate < date '1995-03-17'
	and l_shipdate > date '1995-03-17'
group by
	l_orderkey,
	o_orderdate,
	o_shippriority
order by
	revenue desc,
	o_orderdate
LIMIT 10; 
 
 
#Q4:
select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date '1995-07-01'
	and o_orderdate < date '1995-07-01' + interval '3' month
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority
LIMIT 1;
 
 
 
#Q5:
select
	n_name,
	sum(l_extendedprice * (1 - l_discount)) as revenue
from
	customer,
	orders,
	lineitem,
	supplier,
	nation,
	region
where
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and l_suppkey = s_suppkey
	and c_nationkey = s_nationkey
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'AFRICA'
	and o_orderdate >= date '1997-01-01'
	and o_orderdate < date '1997-01-01' + interval '1' year
group by
	n_name
order by
	revenue desc
LIMIT 1;
 
 
 
#Q6:
select
	sum(l_extendedprice * l_discount) as revenue
from
	lineitem
where
	l_shipdate >= date '1997-01-01'
	and l_shipdate < date '1997-01-01' + interval '1' year
	and l_discount between 0.05 - 0.01 and 0.05 + 0.01
	and l_quantity < 25
LIMIT 1;
 
 
 
#Q7:
select
	supp_nation,
	cust_nation,
	l_year,
	sum(volume) as revenue
from
	(
		select
			n1.n_name as supp_nation,
			n2.n_name as cust_nation,
			extract(year from l_shipdate) as l_year,
			l_extendedprice * (1 - l_discount) as volume
		from
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2
		where
			s_suppkey = l_suppkey
			and o_orderkey = l_orderkey
			and c_custkey = o_custkey
			and s_nationkey = n1.n_nationkey
			and c_nationkey = n2.n_nationkey
			and (
				(n1.n_name = 'IRAN' and n2.n_name = 'PERU')
				or (n1.n_name = 'PERU' and n2.n_name = 'IRAN')
			)
			and l_shipdate between date '1995-01-01' and date '1996-12-31'
	) as shipping
group by
	supp_nation,
	cust_nation,
	l_year
order by
	supp_nation,
	cust_nation,
	l_year
LIMIT 1; 
 
 
#Q8:
select
	o_year,
	sum(case
		when nation = 'PERU' then volume
		else 0
	end) / sum(volume) as mkt_share
from
	(
		select
			extract(year from o_orderdate) as o_year,
			l_extendedprice * (1 - l_discount) as volume,
			n2.n_name as nation
		from
			part,
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2,
			region
		where
			p_partkey = l_partkey
			and s_suppkey = l_suppkey
			and l_orderkey = o_orderkey
			and o_custkey = c_custkey
			and c_nationkey = n1.n_nationkey
			and n1.n_regionkey = r_regionkey
			and r_name = 'AMERICA'
			and s_nationkey = n2.n_nationkey
			and o_orderdate between date '1995-01-01' and date '1996-12-31'
			and p_type = 'ECONOMY ANODIZED COPPER'
	) as all_nations
group by
	o_year
order by
	o_year
LIMIT 1; 
 
 
#Q9:
select
	nation,
	o_year,
	sum(amount) as sum_profit
from
	(
		select
			n_name as nation,
			extract(year from o_orderdate) as o_year,
			l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
		from
			part,
			supplier,
			lineitem,
			partsupp,
			orders,
			nation
		where
			s_suppkey = l_suppkey
			and ps_suppkey = l_suppkey
			and ps_partkey = l_partkey
			and p_partkey = l_partkey
			and o_orderkey = l_orderkey
			and s_nationkey = n_nationkey
			and p_name like '%pink%'
	) as profit
group by
	nation,
	o_year
order by
	nation,
	o_year desc
LIMIT 1;
 
 
 
#Q10:
select
	c_custkey,
	c_name,
	sum(l_extendedprice * (1 - l_discount)) as revenue,
	c_acctbal,
	n_name,
	c_address,
	c_phone,
	c_comment
from
	customer,
	orders,
	lineitem,
	nation
where
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and o_orderdate >= date '1993-05-01'
	and o_orderdate < date '1993-05-01' + interval '3' month
	and l_returnflag = 'R'
	and c_nationkey = n_nationkey
group by
	c_custkey,
	c_name,
	c_acctbal,
	c_phone,
	n_name,
	c_address,
	c_comment
order by
	revenue desc
LIMIT 20; 
 
 
#Q11:
select
	ps_partkey,
	sum(ps_supplycost * ps_availqty) as value
from
	partsupp,
	supplier,
	nation
where
	ps_suppkey = s_suppkey
	and s_nationkey = n_nationkey
	and n_name = 'MOZAMBIQUE'
group by
	ps_partkey having
		sum(ps_supplycost * ps_availqty) > (
			select
				sum(ps_supplycost * ps_availqty) * 0.0001000000
			from
				partsupp,
				supplier,
				nation
			where
				ps_suppkey = s_suppkey
				and s_nationkey = n_nationkey
				and n_name = 'MOZAMBIQUE'
		)
order by
	value desc
LIMIT 1; 
 
 
#Q12:
select
	l_shipmode,
	sum(case
		when o_orderpriority = '1-URGENT'
			or o_orderpriority = '2-HIGH'
			then 1
		else 0
	end) as high_line_count,
	sum(case
		when o_orderpriority <> '1-URGENT'
			and o_orderpriority <> '2-HIGH'
			then 1
		else 0
	end) as low_line_count
from
	orders,
	lineitem
where
	o_orderkey = l_orderkey
	and (l_shipmode = 'MAIL' or l_shipmode = 'FOB')
	and l_commitdate < l_receiptdate
	and l_shipdate < l_commitdate
	and l_receiptdate >= date '1994-01-01'
	and l_receiptdate < date '1994-01-01' + interval '1' year
group by
	l_shipmode
order by
	l_shipmode
LIMIT 1;
 
 
 
#Q13:
select
	c_count,
	count(*) as custdist
from
	(
		select
			c_custkey,
			count(o_orderkey)
		from
			customer left outer join orders on
				c_custkey = o_custkey
				and o_comment not like '%special%requests%'
		group by
			c_custkey
	) as c_orders (c_custkey, c_count)
group by
	c_count
order by
	custdist desc,
	c_count desc
LIMIT 1; 
 
 
#Q14:
select
	100.00 * sum(case
		when p_type like 'PROMO%'
			then l_extendedprice * (1 - l_discount)
		else 0
	end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
	lineitem,
	part
where
	l_partkey = p_partkey
	and l_shipdate >= date '1994-10-01'
	and l_shipdate < date '1994-10-01' + interval '1' month
LIMIT 1; 
 
 
#Q15:
create view revenue0 (supplier_no, total_revenue) as
	select
		l_suppkey,
		sum(l_extendedprice * (1 - l_discount))
	from
		lineitem
	where
		l_shipdate >= date '1994-04-01'
		and l_shipdate < date '1994-04-01' + interval '3' month
	group by
		l_suppkey;


select
	s_suppkey,
	s_name,
	s_address,
	s_phone,
	total_revenue
from
	supplier,
	revenue0
where
	s_suppkey = supplier_no
	and total_revenue = (
		select
			max(total_revenue)
		from
			revenue0
	)
order by
	s_suppkey
LIMIT 1;

drop view revenue0; 
 
 
#Q16:
select
	p_brand,
	p_type,
	p_size,
	count(distinct ps_suppkey) as supplier_cnt
from
	partsupp,
	part
where
	p_partkey = ps_partkey
	and p_brand <> 'Brand#11'
	and p_type not like 'STANDARD BRUSHED%'
	and p_size in (9, 21, 24, 30, 36, 50, 27, 39)
	and ps_suppkey not in (
		select
			s_suppkey
		from
			supplier
		where
			s_comment like '%Customer%Complaints%'
	)
group by
	p_brand,
	p_type,
	p_size
order by
	supplier_cnt desc,
	p_brand,
	p_type,
	p_size
LIMIT 1; 
 
 
#Q17:
select
	sum(l_extendedprice) / 7.0 as avg_yearly
from
	lineitem,
	part,
	(SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
	p_partkey = l_partkey
	and agg_partkey = l_partkey
	and p_brand = 'Brand#11'
	and p_container = 'SM CAN'
	and l_quantity < avg_quantity
LIMIT 1;
 
 
 
#Q18:
select
	c_name,
	c_custkey,
	o_orderkey,
	o_orderdate,
	o_totalprice,
	sum(l_quantity)
from
	customer,
	orders,
	lineitem
where
	o_orderkey in (
		select
			l_orderkey
		from
			lineitem
		group by
			l_orderkey having
				sum(l_quantity) > 312
	)
	and c_custkey = o_custkey
	and o_orderkey = l_orderkey
group by
	c_name,
	c_custkey,
	o_orderkey,
	o_orderdate,
	o_totalprice
order by
	o_totalprice desc,
	o_orderdate
LIMIT 100; 
 
 
#Q19:
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#12'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 5 and l_quantity <= 5 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#33'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 18 and l_quantity <= 18 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#45'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 20 and l_quantity <= 20 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
LIMIT 1; 
 
 
#Q20:
select
	s_name,
	s_address
from
	supplier,
	nation
where
	s_suppkey in (
		select
			ps_suppkey
		from
			partsupp,
			(
				select
					l_partkey agg_partkey,
					l_suppkey agg_suppkey,
					0.5 * sum(l_quantity) AS agg_quantity
				from
					lineitem
				where
					l_shipdate >= date '1995-01-01'
					and l_shipdate < date '1995-01-01' + interval '1' year
				group by
					l_partkey,
					l_suppkey
			) agg_lineitem
		where
			agg_partkey = ps_partkey
			and agg_suppkey = ps_suppkey
			and ps_partkey in (
				select
					p_partkey
				from
					part
				where
					p_name like 'hot%'
			)
			and ps_availqty > agg_quantity
	)
	and s_nationkey = n_nationkey
	and n_name = 'KENYA'
order by
	s_name
LIMIT 1;
 
 
 
#Q21:
select
	s_name,
	count(*) as numwait
from
	supplier,
	lineitem l1,
	orders,
	nation
where
	s_suppkey = l1.l_suppkey
	and o_orderkey = l1.l_orderkey
	and o_orderstatus = 'F'
	and l1.l_receiptdate > l1.l_commitdate
	and exists (
		select
			*
		from
			lineitem l2
		where
			l2.l_orderkey = l1.l_orderkey
			and l2.l_suppkey <> l1.l_suppkey
	)
	and not exists (
		select
			*
		from
			lineitem l3
		where
			l3.l_orderkey = l1.l_orderkey
			and l3.l_suppkey <> l1.l_suppkey
			and l3.l_receiptdate > l3.l_commitdate
	)
	and s_nationkey = n_nationkey
	and n_name = 'IRAQ'
group by
	s_name
order by
	numwait desc,
	s_name
LIMIT 100; 
 
 
#Q22:
select
	cntrycode,
	count(*) as numcust,
	sum(c_acctbal) as totacctbal
from
	(
		select
			substring(c_phone from 1 for 2) as cntrycode,
			c_acctbal
		from
			customer
		where
			substring(c_phone from 1 for 2) in
				('24', '27', '34', '23', '29', '15', '13')
			and c_acctbal > (
				select
					avg(c_acctbal)
				from
					customer
				where
					c_acctbal > 0.00
					and substring(c_phone from 1 for 2) in
						('24', '27', '34', '23', '29', '15', '13')
			)
			and not exists (
				select
					*
				from
					orders
				where
					o_custkey = c_custkey
			)
	) as custsale
group by
	cntrycode
order by
	cntrycode
LIMIT 1;