跳至主要內容

Klustron 空间数据管理

Klustron大约 4 分钟

Klustron 空间数据管理

Klustron 支持使用PostGIS插件来管理空间数据,并且PostGIS的所有功能在Klustron中都可用。用户必须使用Klustron 自带的PostGIS插件,此插件由Klustron团队基于PostGIS-3.3.4 来开发而成,称为Klustron-PostGIS。Klustron-PostGIS插件把GIS数据按照存储节点(Klustron-storage)要求的格式,与存储节点交换GIS数据。Klustron可以使用到存储节点的RTREE索引来索引GIS数据,从而达到较好的空间数据查询性能。特别是在Klustron集群可以使用大量服务器的硬件资源,这对于GIS数据管理来说非常必要,因为每个GIS数据都较大(至少几十字节,通常数百字节到几十KB,甚至更大),占用的存储空间较大,而且空间计算消耗资源非常大,很容易耗尽单台服务器的硬件资源,所以使用集中式数据库管理GIS数据,天然地缺乏扩展能力,无法满足大量空间数据管理场景的需求。挂载Klustron-PostGIS后的Klustron集群就是一个分布式的PostGIS空间数据库集群,具有上不封顶的空间数据存储和计算能力。

本文展示Klustron-PostGIS的功能和用法。事实上,Klustron-PostGIS的用法与PostgreSQL中挂载PostGIS完全相同。


gis_test=# create extension postgis; -- 挂载Klustron-PostGIS插件
CREATE EXTENSION
gis_test=# create table village(id serial primary key, facility varchar(64) not null, coord geometry); -- geometry 是所有集合对象的基类型,也可以根据需要使用具体的某种几何类型,详见PostGIS文档。
CREATE TABLE
gis_test=# insert into village(facility, coord) values('school', st_geomfromtext('polygon((1 1, 3 1, 3 3, 1 3, 1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('market', st_geomfromtext('polygon((-1 -1, -3 -1, -3 -3, -1 -3, -1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('post office', st_geomfromtext('polygon((-0.2 -0.2, 0.2 -0.2, 0.2 0.2, -0.2 0.2, -0.2 -0.2))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('hospital', st_geomfromtext('polygon((1 -1, 3 -1, 3 -3, 1 -3, 1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('warehouses', st_geomfromtext('polygon((-1 1, -5 1, -5 6, -1 6, -1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('unknown place', NULL);
INSERT 0 1
gis_test=# insert into village(facility, coord) values('available area', st_geomfromtext('polygon empty'));
INSERT 0 1

下面看一下从village表中查出来的数据。其中的GIS数据是二进制编码存储的,读者无需试图理解二进制的这些GIS数据。如果用户希望输出GIS文本数据,可以使用ST_ASTEXT() 函数,例如select id, facility, st_astext(coord) from village.

gis_test=# select*from village;
 id |    facility    |                                                                                           coord                                                                                       
     
----+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----
  1 | school         | 01030000000100000005000000000000000000F03F000000000000F03F0000000000000840000000000000F03F00000000000008400000000000000840000000000000F03F0000000000000840000000000000F03F000000000000
F03F
  2 | market         | 01030000000100000005000000000000000000F0BF000000000000F0BF00000000000008C0000000000000F0BF00000000000008C000000000000008C0000000000000F0BF00000000000008C0000000000000F0BF000000000000
F0BF
  3 | post office    | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999
C9BF
  4 | hospital       | 01030000000100000005000000000000000000F03F000000000000F0BF0000000000000840000000000000F0BF000000000000084000000000000008C0000000000000F03F00000000000008C0000000000000F03F000000000000
F0BF
  5 | warehouses     | 01030000000100000005000000000000000000F0BF000000000000F03F00000000000014C0000000000000F03F00000000000014C00000000000001840000000000000F0BF0000000000001840000000000000F0BF000000000000
F03F
  6 | unknown place  | 
  7 | available area | 010300000000000000
(7 rows)

下面查询包含 坐标为(0,0)的点的几何图形。 根据上图数据得知,查询结果是正确的。用EXPLAIN语句查看查询计划,可以看到ST_CONTAINS这个包含关系被下推到了存储节点,使用MBRCONTAINS执行,改查询在存储节点可以使用到coord列的空间索引,如果有的话。此时我们还没有在coord列创建空间索引。查询特殊值NULL和判断是否为空也可以下推到存储节点执行。注意NULL和empty的空间数据是不同的,这就类似对于字符类型的列来说,NULL和空串''是不同的。

gis_test=# select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
 id |  facility   |                                                                                           coord                                                                                          
  
----+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
  3 | post office | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C9B
F
(1 row)

gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=101.02..101.02 rows=1 width=182)
   Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)


gis_test=# explain select*from village where coord is null;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (village.coord IS NULL)
(2 rows)


gis_test=# select*from village where st_isempty(coord);
 id |    facility    |       coord        
----+----------------+--------------------
  7 | available area | 010300000000000000
(1 row)

gis_test=# explain select*from village where st_isempty(coord);
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE ST_IsEmpty(village.coord)
(2 rows)

在village.coord列创建空间索引,然后EXPLAIN 输出查询计划。从查询计划看不到创建空间索引之前与之后,查询计划的区别。不过在下图中可以看到,在存储节点中explain 发给存储节点的查询语句, 有了空间索引后,该索引被使用来执行存储节点收到的查询了。

gis_test=# 
gis_test=# 
gis_test=# create index vil_coord_idx on village using gist(coord);
CREATE INDEX
gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)

gis_test=# select*from village where coord is null;
 id |   facility    | coord 
----+---------------+-------
  6 | unknown place | 
(1 row)

存储节点执行的空间查询的查询计划