Klustron 空间数据管理
Klustron 空间数据管理
Klustron 支持使用PostGIS插件来管理空间数据,并且PostGIS的所有功能在Klustron中都可用。用户必须使用Klustron 自带的PostGIS插件,此插件由Klustron团队基于PostGIS-3.3.4 来开发而成,称为Klustron-PostGIS。Klustron-PostGIS插件把GIS数据按照存储节点(Klustron-storage)要求的格式,与存储节点交换GIS数据。Klustron可以使用到存储节点的RTREE索引来索引GIS数据,从而达到较好的空间数据查询性能。特别是在Klustron集群可以使用大量服务器的硬件资源,这对于GIS数据管理来说非常必要,因为每个GIS数据都较大(至少几十字节,通常数百字节到几十KB,甚至更大),占用的存储空间较大,而且空间计算消耗资源非常大,很容易耗尽单台服务器的硬件资源,所以使用集中式数据库管理GIS数据,天然地缺乏扩展能力,无法满足大量空间数据管理场景的需求。挂载Klustron-PostGIS后的Klustron集群就是一个分布式的PostGIS空间数据库集群,具有上不封顶的空间数据存储和计算能力。
本文展示Klustron-PostGIS的功能和用法。事实上,Klustron-PostGIS的用法与PostgreSQL中挂载PostGIS完全相同。
gis_test=# create extension postgis; -- 挂载Klustron-PostGIS插件
CREATE EXTENSION
gis_test=# create table village(id serial primary key, facility varchar(64) not null, coord geometry); -- geometry 是所有集合对象的基类型,也可以根据需要使用具体的某种几何类型,详见PostGIS文档。
CREATE TABLE
gis_test=# insert into village(facility, coord) values('school', st_geomfromtext('polygon((1 1, 3 1, 3 3, 1 3, 1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('market', st_geomfromtext('polygon((-1 -1, -3 -1, -3 -3, -1 -3, -1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('post office', st_geomfromtext('polygon((-0.2 -0.2, 0.2 -0.2, 0.2 0.2, -0.2 0.2, -0.2 -0.2))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('hospital', st_geomfromtext('polygon((1 -1, 3 -1, 3 -3, 1 -3, 1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('warehouses', st_geomfromtext('polygon((-1 1, -5 1, -5 6, -1 6, -1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('unknown place', NULL);
INSERT 0 1
gis_test=# insert into village(facility, coord) values('available area', st_geomfromtext('polygon empty'));
INSERT 0 1
下面看一下从village表中查出来的数据。其中的GIS数据是二进制编码存储的,读者无需试图理解二进制的这些GIS数据。如果用户希望输出GIS文本数据,可以使用ST_ASTEXT() 函数,例如select id, facility, st_astext(coord) from village
.
gis_test=# select*from village;
id | facility | coord
----+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----
1 | school | 01030000000100000005000000000000000000F03F000000000000F03F0000000000000840000000000000F03F00000000000008400000000000000840000000000000F03F0000000000000840000000000000F03F000000000000
F03F
2 | market | 01030000000100000005000000000000000000F0BF000000000000F0BF00000000000008C0000000000000F0BF00000000000008C000000000000008C0000000000000F0BF00000000000008C0000000000000F0BF000000000000
F0BF
3 | post office | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999
C9BF
4 | hospital | 01030000000100000005000000000000000000F03F000000000000F0BF0000000000000840000000000000F0BF000000000000084000000000000008C0000000000000F03F00000000000008C0000000000000F03F000000000000
F0BF
5 | warehouses | 01030000000100000005000000000000000000F0BF000000000000F03F00000000000014C0000000000000F03F00000000000014C00000000000001840000000000000F0BF0000000000001840000000000000F0BF000000000000
F03F
6 | unknown place |
7 | available area | 010300000000000000
(7 rows)
下面查询包含 坐标为(0,0)的点的几何图形。 根据上图数据得知,查询结果是正确的。用EXPLAIN语句查看查询计划,可以看到ST_CONTAINS这个包含关系被下推到了存储节点,使用MBRCONTAINS执行,改查询在存储节点可以使用到coord列的空间索引,如果有的话。此时我们还没有在coord列创建空间索引。查询特殊值NULL和判断是否为空也可以下推到存储节点执行。注意NULL和empty的空间数据是不同的,这就类似对于字符类型的列来说,NULL和空串''是不同的。
gis_test=# select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
id | facility | coord
----+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
3 | post office | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C9B
F
(1 row)
gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RemotePlan (cost=101.02..101.02 rows=1 width=182)
Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
Shard: 3 Remote SQL: SELECT village.id,village.facility,village.coord FROM `gis_test_$$_public`.`village` WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)
gis_test=# explain select*from village where coord is null;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
RemotePlan (cost=102.02..102.02 rows=1 width=182)
Shard: 3 Remote SQL: SELECT village.id,village.facility,village.coord FROM `gis_test_$$_public`.`village` WHERE (village.coord IS NULL)
(2 rows)
gis_test=# select*from village where st_isempty(coord);
id | facility | coord
----+----------------+--------------------
7 | available area | 010300000000000000
(1 row)
gis_test=# explain select*from village where st_isempty(coord);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
RemotePlan (cost=102.02..102.02 rows=1 width=182)
Shard: 3 Remote SQL: SELECT village.id,village.facility,village.coord FROM `gis_test_$$_public`.`village` WHERE ST_IsEmpty(village.coord)
(2 rows)
在village.coord列创建空间索引,然后EXPLAIN 输出查询计划。从查询计划看不到创建空间索引之前与之后,查询计划的区别。不过在下图中可以看到,在存储节点中explain 发给存储节点的查询语句, 有了空间索引后,该索引被使用来执行存储节点收到的查询了。
gis_test=#
gis_test=#
gis_test=# create index vil_coord_idx on village using gist(coord);
CREATE INDEX
gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RemotePlan (cost=102.02..102.02 rows=1 width=182)
Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
Shard: 3 Remote SQL: SELECT village.id,village.facility,village.coord FROM `gis_test_$$_public`.`village` WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)
gis_test=# select*from village where coord is null;
id | facility | coord
----+---------------+-------
6 | unknown place |
(1 row)