添加具有子select的筛选器后,SQL查询失去了性能



在postgresql中添加了另一个过滤器后,我有一个函数完全失去了它的性能

下面是一个简单的例子,展示了它最初的良好性能。

CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
RETURNS TABLE(blablabla)
LANGUAGE sql
IMMUTABLE
AS $function$
with data as (
select id,amount,account_nr from transfer
)
select * from 
data d
where  param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id)
$function$
;
(cost=0.25..10.25 rows=1000 width=560)  
(actual time=1162.528..1162.561 rows=306 loops=1)  
Buffers: shared hit=1099180  
Planning time: 2.928 ms  
Execution time: 1162.630 ms

在我添加了另一个带有子选择和计数的过滤器后,我失去了性能。这个计数对性能有影响吗?我能用另一种方法解决它吗?

CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
RETURNS TABLE(blablabla)
LANGUAGE sql
IMMUTABLE
AS $function$
with data as (
select id,amount,account_nr from transfer
)
select * from 
data d
where (param_b or 1 < (select count(d2.account_nr) 
from data d2
where d2.id = d.id
group by d2.account_nr))
and (param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id))
$function$
;
(cost=0.25..10.25 rows=1000 width=560)  
(actualtime=271191.341..271191.383 rows=306 loops=1)  
Buffers: shared hit=1099180  
Planning time: 2.955 ms 
Execution time: 271191.463 ms

嵌入存储函数的慢速查询是:

with data as (             -- original query from the question.
select id,amount,account_nr from transfer
)
select * 
from data d
where (param_b or 1 < (select count(d2.account_nr) 
from data d2
where d2.id = d.id
group by d2.account_nr)
)
and (param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id)
)

这有一个毫无意义的公共表表达式。为了简单起见,我们可以去掉它。如果你出于其他目的需要它,你可以随时把它放回去。

并且有几个相关的子查询。让我们将它们重构为一个独立的子查询。从独立的子查询开始:

select id, 
count(account_nr) account_count,
sum(amount) total_amount
from transfer
group by id

此聚合子查询生成transfer表中每个id的帐户数和总金额。眼球的结果说服自己它做了你需要它做的事。

然后,我们可以将其连接到主查询中,并应用您的WHERE条件。

select d.id, d.amount, d.account_nr 
from transfer d
join (
select id, 
count(account_nr) account_count,
sum(amount) total_amount
from transfer
group by id
)  d2 ON d.id = d2.id
where (param_b or 1 < d2.account_count)
and (param_a or 0.00 <> d2.total_amount)

使用独立的子查询可以大大加快速度;有时,查询规划器决定需要多次重新评估依赖子查询。

以下索引将帮助子查询更快地运行。

CREATE INDEX id_details ON transfer (id) INCLUDE (account_nr, amount);

说服自己这是有效的,而且足够快。(我没有调试它,因为我没有你的数据。(你需要用truefalse代替param_aparam_b来测试它。

然后,也只有到那时,将它放入存储的函数中。

最新更新