在 postgresql 中计算最适合的地理位置



我有一个查询来查找更大区域内最合适的区域。最佳拟合的定义是当小区域的质心在大区域内时,如下所示:

select p.id as parent_area_id, c.id as child_area_id
from fog.area p
inner join fog.area c on ST_CONTAINS(p.shape, ST_CENTROID(c.shape))
inner join fog.area_type_group_flattened gc on gc.child_type_id = c.type_id
inner join fog.area_type_group_flattened gp on gp.child_type_id = p.type_id
inner join fog.area_type_hierarchy h on h.parent_type_id = gp.parent_type_id and h.child_type_id = gc.parent_type_id
where p.id = :parent and c.type_id != :excludingchildtype

在 180 秒内返回 28 行。

我在形状列上有一个几何类型的要点索引。如果我用_ST_CONTAINS替换ST_CONTAINS,则需要相同的时间,这表明它不使用索引,或者它的使用没有太大影响。带有几何列的表包含 244,325 行,因此并不大。

如果我用 c.shape && p.shape 替换 ST_CONTAINS,它会在 395 毫秒内返回 280 行。所以性能要好得多,但它返回的行太多了。

我想我可以使用&&连接,然后在 where 子句中使用 ST_CONTAINS,这样昂贵的部分就可以在一个小的结果集上运行,但这需要 39 秒。

有没有一种好方法可以在不更改结果的情况下提高此查询的性能?或者类似的技术,大致相同但速度更快?

Postgres 版本是 9.4.4

解释分析

Nested Loop  (cost=287.00..21510.84 rows=1 width=8) (actual time=13067.916..75160.246 rows=135 loops=1)
  Join Filter: ((t1.child_type_id = p.type_id) AND (p.shape && st_centroid(c.shape)) AND _st_contains(p.shape, st_centroid(c.shape)))
  Rows Removed by Join Filter: 2933423
  ->  Index Scan using area_pkey on area p  (cost=0.42..8.44 rows=1 width=8633) (actual time=0.123..0.125 rows=1 loops=1)
        Index Cond: (id = 246420)
  ->  Nested Loop  (cost=286.58..21376.31 rows=467 width=8633) (actual time=2.339..46664.858 rows=2933558 loops=1)
        ->  Nested Loop  (cost=286.16..288.61 rows=1 width=8) (actual time=2.172..63.684 rows=198 loops=1)
              Join Filter: (h.parent_type_id = t1.parent_type_id)
              Rows Removed by Join Filter: 7862
              ->  HashAggregate  (cost=143.06..143.07 rows=1 width=8) (actual time=0.963..1.073 rows=62 loops=1)
                    Group Key: t1.parent_type_id, t1.child_type_id
                    CTE types
                      ->  Recursive Union  (cost=0.00..135.70 rows=199 width=8) (actual time=0.043..0.448 rows=66 loops=1)
                            ->  Seq Scan on area_type area_type_1  (cost=0.00..2.49 rows=49 width=4) (actual time=0.040..0.087 rows=50 loops=1)
                            ->  Hash Join  (cost=1.14..12.92 rows=15 width=8) (actual time=0.075..0.112 rows=8 loops=2)
                                  Hash Cond: (t_1.child_type_id = g_1.parent_type_id)
                                  ->  WorkTable Scan on types t_1  (cost=0.00..9.80 rows=490 width=4) (actual time=0.002..0.032 rows=33 loops=2)
                                  ->  Hash  (cost=1.06..1.06 rows=6 width=8) (actual time=0.057..0.057 rows=16 loops=1)
                                        Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                        ->  Seq Scan on area_type_group_member g_1  (cost=0.00..1.06 rows=6 width=8) (actual time=0.019..0.035 rows=16 loops=1)
                    ->  Hash Join  (cost=2.62..7.36 rows=1 width=8) (actual time=0.223..0.884 rows=62 loops=1)
                          Hash Cond: (t1.child_type_id = t2.id)
                          ->  CTE Scan on types t1  (cost=0.00..3.98 rows=199 width=8) (actual time=0.046..0.579 rows=66 loops=1)
                          ->  Hash  (cost=2.61..2.61 rows=1 width=4) (actual time=0.146..0.146 rows=46 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 2kB
                                ->  Seq Scan on area_type t2  (cost=0.00..2.61 rows=1 width=4) (actual time=0.016..0.094 rows=46 loops=1)
                                      Filter: (COALESCE(is_group, B'0'::bit(1)) = B'0'::bit(1))
                                      Rows Removed by Filter: 4
              ->  Hash Join  (cost=143.09..145.50 rows=2 width=8) (actual time=0.026..0.512 rows=130 loops=62)
                    Hash Cond: (h.child_type_id = gc.parent_type_id)
                    ->  Seq Scan on area_type_hierarchy h  (cost=0.00..2.01 rows=101 width=8) (actual time=0.005..0.103 rows=103 loops=62)
                    ->  Hash  (cost=143.08..143.08 rows=1 width=8) (actual time=1.116..1.116 rows=62 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 3kB
                          ->  Subquery Scan on gc  (cost=143.06..143.08 rows=1 width=8) (actual time=0.874..1.047 rows=62 loops=1)
                                ->  HashAggregate  (cost=143.06..143.07 rows=1 width=8) (actual time=0.872..0.931 rows=62 loops=1)
                                      Group Key: t1_1.parent_type_id, t1_1.child_type_id
                                      CTE types
                                        ->  Recursive Union  (cost=0.00..135.70 rows=199 width=8) (actual time=0.005..0.388 rows=66 loops=1)
                                              ->  Seq Scan on area_type  (cost=0.00..2.49 rows=49 width=4) (actual time=0.003..0.052 rows=50 loops=1)
                                              ->  Hash Join  (cost=1.14..12.92 rows=15 width=8) (actual time=0.065..0.103 rows=8 loops=2)
                                                    Hash Cond: (t.child_type_id = g.parent_type_id)
                                                    ->  WorkTable Scan on types t  (cost=0.00..9.80 rows=490 width=4) (actual time=0.001..0.030 rows=33 loops=2)
                                                    ->  Hash  (cost=1.06..1.06 rows=6 width=8) (actual time=0.039..0.039 rows=16 loops=1)
                                                          Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                          ->  Seq Scan on area_type_group_member g  (cost=0.00..1.06 rows=6 width=8) (actual time=0.002..0.018 rows=16 loops=1)
                                      ->  Hash Join  (cost=2.62..7.36 rows=1 width=8) (actual time=0.153..0.797 rows=62 loops=1)
                                            Hash Cond: (t1_1.child_type_id = t2_1.id)
                                            ->  CTE Scan on types t1_1  (cost=0.00..3.98 rows=199 width=8) (actual time=0.007..0.523 rows=66 loops=1)
                                            ->  Hash  (cost=2.61..2.61 rows=1 width=4) (actual time=0.114..0.114 rows=46 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 2kB
                                                  ->  Seq Scan on area_type t2_1  (cost=0.00..2.61 rows=1 width=4) (actual time=0.009..0.063 rows=46 loops=1)
                                                        Filter: (COALESCE(is_group, B'0'::bit(1)) = B'0'::bit(1))
                                                        Rows Removed by Filter: 4
        ->  Index Scan using fki_area_area_type_fkey on area c  (cost=0.42..21004.21 rows=8349 width=8633) (actual time=6.507..173.256 rows=14816 loops=198)
              Index Cond: (type_id = gc.child_type_id)
              Filter: (type_id <> 7)
              Rows Removed by Filter: 2282
Planning time: 9.831 ms
Execution time: 75160.936 ms

用解决方法回答我自己的问题,但不接受,因为它感觉像一个黑客。希望得到更好的答案。

通过使用末尾带有 offset 0 的子查询,将首先计算子查询。然后ST_CONTAINS,这是昂贵的部分,在更小的数据集上运行。这是基于 http://blog.2ndquadrant.com/hinting_at_postgresql/的建议。

select parent_area_id, child_area_id from
    (select p.id as parent_area_id, c.id as child_area_id, p.shape, st_centroid(c.shape) as centroid
    from fog.area p
    inner join fog.area c on p.shape && c.shape
    inner join fog.area_type_group_flattened gc on gc.child_type_id = c.type_id
    inner join fog.area_type_group_flattened gp on gp.child_type_id = p.type_id
    inner join fog.area_type_hierarchy h on h.parent_type_id = gp.parent_type_id and h.child_type_id = gc.parent_type_id
    where p.id = :parent and c.type_id != :excludingchildtype offset 0) sq
where ST_CONTAINS(shape, centroid)

这在 <1 秒内运行

最新更新