Postgresql计划不良的查询运行时间过长

我有一个复杂的查询，下面已经大大简化，运行在"PostgreSQL 11.9上aarch64-unknown-linux-gnu，由aarch64-unknown-linux-gnu- GCC (GCC) 7.4.0编译，64位"，运行在AWS Aurora Serverless 2xlarge服务器(8核，64GB RAM)。

我有以下内容…

mv_journey，一个实体化视图，有~550M行，包含有起点和终点的旅程信息，以及关于这些信息的一些度量(旅程花了多长时间等)，用列from_id和from_region表示起点，to_id和to_region表示终点。

place_from和place_to，它们是在CTE的初始步骤中从函数fn_location_get计算出来的，包含id和region(分别映射到from_id、from_region和to_id、to_region)。这些还包含来自该区域的汇总级别，例如country、continent。通常，这些返回值在~100到20,000行之间。

在该CTE的后面，我使用place_from和place_to来过滤550Mmv_journey行，并使用group by来创建基于旅行的汇总报告，例如从一个国家到另一个国家。

简化后的查询是这样的:

WITH place_from AS (
select * 
from fn_location_get(...)
), place_to AS (
select * 
from fn_location_get(...)
)
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]

我在mv_journey上有索引

CREATE INDEX idx_mv_journey_from ON mv_journey (from_id, from_region);
CREATE INDEX idx_mv_journey_to ON mv_journey (to_id, to_region);

当我运行查询(使用SET LOCAL work_mem = '2048MB'调用快速排序)时，place_from中有少量的行(92)，place_to中有大量的行(~18,000)，使用以下查询计划(其中包括CTE中生成place_from和place_to的步骤)，查询运行大约25秒。

"GroupAggregate  (cost=530108.64..530129.64 rows=30 width=686) (actual time=13097.187..25408.707 rows=92 loops=1)"
"  Group Key: [many dimension columns...]"
"  CTE place_from"
"    ->  Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=34.275..34.331 rows=92 loops=1)"
"  CTE place_to"
"    ->  Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=96.287..97.428 rows=18085 loops=1)"
"  ->  Sort  (cost=530088.14..530088.22 rows=30 width=622) (actual time=12935.329..13295.468 rows=1871349 loops=1)"
"        Sort Key: [many dimension columns...]"
"        Sort Method: quicksort  Memory: 826782kB"
"        ->  Merge Join  (cost=529643.68..530087.41 rows=30 width=622) (actual time=4708.780..6021.449 rows=1871349 loops=1)"
"              Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
"              ->  Sort  (cost=529573.85..529719.16 rows=58124 width=340) (actual time=4583.265..4788.625 rows=1878801 loops=1)"
"                    Sort Key: j.to_id, j.to_region"
"                    Sort Method: quicksort  Memory: 623260kB"
"                    ->  Nested Loop  (cost=0.57..524974.25 rows=58124 width=340) (actual time=34.324..3079.815 rows=1878801 loops=1)"
"                          ->  CTE Scan on place_from o  (cost=0.00..20.00 rows=1000 width=320) (actual time=34.277..34.432 rows=92 loops=1)"
"                          ->  Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..524.37 rows=58 width=60) (actual time=0.018..30.022 rows=20422 loops=92)"
"                                Index Cond: ((from_id = o.id) AND (from_region = o.region))"
"                                Filter: (service_type_id = 'ALL'::text)"
"                                Rows Removed by Filter: 81687"
"              ->  Sort  (cost=69.83..72.33 rows=1000 width=320) (actual time=125.505..223.780 rows=1871350 loops=1)"
"                    Sort Key: d.id, d.region"
"                    Sort Method: quicksort  Memory: 3329kB"
"                    ->  CTE Scan on place_to d  (cost=0.00..20.00 rows=1000 width=320) (actual time=96.292..103.677 rows=18085 loops=1)"
"Planning Time: 0.546 ms"
"Execution Time: 25501.827 ms"

问题是，当我交换from/to中的位置时，即place_from中的大量行(~18,000)和place_to中的少量行(92)，查询将永远持续下去。顺便说一下，mv_journey在两种情况下都有相同的行数匹配-在一个方向上不会比另一个方向上有更多的记录。

我没有一次得到这个第二个查询完成没有它运行几个小时和PGAdmin 4失去连接到服务器。因此，我甚至不能对它做EXPLAIN ANALYZE。但是我有EXPLAIN:

"GroupAggregate  (cost=474135.40..474152.90 rows=25 width=686)"
"  Group Key: [many dimension columns...]"
"  CTE place_from"
"    ->  Function Scan on fn_location_get  (cost=0.25..10.25 rows=1000 width=396)"
"  CTE place_to"
"    ->  Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
"  ->  Sort  (cost=474114.90..474114.96 rows=25 width=622)"
"        Sort Key: [many dimension columns...]"
"        ->  Merge Join  (cost=473720.23..474114.31 rows=25 width=622)"
"              Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))## Heading ##"
"              ->  Sort  (cost=473650.40..473779.18 rows=51511 width=340)"
"                    Sort Key: j.to_id, j.to_region"
"                    ->  Nested Loop  (cost=0.57..469619.00 rows=51511 width=340)"
"                          ->  CTE Scan on place_from o  (cost=0.00..20.00 rows=1000 width=320)"
"                          ->  Index Scan using idx_mv_journey_from on mv_journey j   (cost=0.57..469.08 rows=52 width=60)"
"                                Index Cond: ((from_id = o.id) AND (from_region = o.region))"
"                                Filter: (service_type_id = 'ALL'::text)"
"              ->  Sort  (cost=69.83..72.33 rows=1000 width=320)"
"                    Sort Key: d.id, d.region"
"                    ->  CTE Scan on place_to d  (cost=0.00..20.00 rows=1000 width=320)"

我的假设是，如果我在from/to的两边有相同的索引，那么Postgres将使用镜像相反的查询计划，为源做合并连接，为目标做嵌套循环连接，使用idx_mv_journey_to。

但是看起来查询规划器的行数估计在两个查询中都偏离了。尽管如此，第一个查询执行得这么好似乎只是运气。

我尝试了以下方法，但都不成功

交换内部连接语句，使目标连接优先
ALTER TABLE mv_journey ALTER COLUMN to_id SET STATISTICS 1000; ANALYZE mv_journey
ALTER TABLE mv_journey ALTER COLUMN from_id SET STATISTICS 1000; ANALYZE mv_journey

我猜这个计划是在CTE开始执行之前完成的?这就是为什么它不知道创建place_from和place_to集合的fn_location_get调用会产生什么结果?

fn_location_get是一个复杂的函数，它有自己的递归CTE，我不想把它的逻辑从函数中取出来，放到这个CTE中。

摆脱这种混乱的最好方法是什么?

最直接的方法是创建两个临时表作为函数调用的结果，手动分析它们，然后对临时表而不是函数调用运行查询。

我在写这个问题的时候找到了答案…不要使用CTE，而是使用临时表。

DROP TABLE IF EXISTS place_from;
CREATE TEMP TABLE place_from AS
select * 
from fn_location_get(...);
DROP TABLE IF EXISTS place_to;
CREATE TEMP TABLE place_to AS
select * 
from fn_location_get(...);
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]

我认为这是有效的，因为当报告select的查询计划完成时，临时表的行计数是已知的，并且可以制定更好的查询计划。

但是，行估计值仍然不准确。好到可以选择正确的计划，但是不准确。

"GroupAggregate  (cost=200682.98..200706.78 rows=34 width=686) (actual time=21233.486..33200.052 rows=92 loops=1)"
"  Group Key: [many dimension columns...]"
"  ->  Sort  (cost=200682.98..200683.07 rows=34 width=622) (actual time=21077.807..21443.739 rows=1802571 loops=1)"
"        Sort Key: [many dimension columns...]"
"        Sort Method: quicksort  Memory: 800480kB"
"        ->  Merge Join  (cost=200555.00..200682.12 rows=34 width=622) (actual time=4820.798..6106.722 rows=1802571 loops=1)"
"              Merge Cond: ((from_id = o.id) AND (from_region = o.region))"
"              ->  Sort  (cost=199652.79..199677.24 rows=9779 width=340) (actual time=4794.354..5003.954 rows=1810023 loops=1)"
"                    Sort Key: j.from_id, j.from_region"
"                    Sort Method: quicksort  Memory: 603741kB"
"                    ->  Nested Loop  (cost=0.57..199004.67 rows=9779 width=340) (actual time=0.044..3498.767 rows=1810023 loops=1)"
"                          ->  Seq Scan on place_to d  (cost=0.00..11.90 rows=190 width=320) (actual time=0.006..0.078 rows=92 loops=1)"
"                          ->  Index Scan using idx_mv_journey_to on mv_journey j  (cost=0.57..1046.82 rows=51 width=60) (actual time=0.020..35.055 rows=19674 loops=92)"
"                                Index Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
"                                Filter: (service_type_id = 'ALL'::text)"
"                                Rows Removed by Filter: 78697"
"              ->  Sort  (cost=902.20..920.02 rows=7125 width=320) (actual time=26.434..121.106 rows=1802572 loops=1)"
"                    Sort Key: o.id, o.region"
"                    Sort Method: quicksort  Memory: 3329kB"
"                    ->  Seq Scan on place_from o  (cost=0.00..446.25 rows=7125 width=320) (actual time=0.016..4.205 rows=18085 loops=1)"
"Planning Time: 0.792 ms"
"Execution Time: 33286.461 ms"

UPDATE:当按照jjanes的建议在CREATE之后添加手动ANALYZE时，现在的估计与预期一致。

相关内容

最新更新

热门标签：