我有以下查询:
-- Really fast
-- Explain Analyze: https://explain.depesz.com/s/lsq8
with start_time as (
select '2022-06-02T17:45:43Z':: timestamp with time zone as time
)
-- 200x slower
-- Explain Analyze: https://explain.depesz.com/s/CavD
with start_time as (
select last_update as time
from processed.last_update
where table_name = 'token_supply'
)
select ident as ma_id
, sum(quantity) as quantity
, sum(quantity) filter (where quantity > 0) as quantity_minted
from public.ma_tx_mint
where exists (
select id
from public.tx
where tx.id = ma_tx_mint.tx_id
and exists (
select id
from public.block
cross join start_time
where block.id = tx.block_id
and block.time >= start_time.time
)
)
group by ident
我正试图查询在指定时间后添加到表中的记录。如果时间像第一个start_time
一样是硬编码的,那么查询将在0.2秒内运行。对于动态检索时间的第二个start_time
,查询运行40秒。
我怎么能让Postgres处理这两种情况相同,动态查询ma_tx_mint
表基于另一个表的行?
版本:PostgreSQL 13.6 on x86_64-pc-linux-gnu,由Debian clang version 12.0.1编译,64位
表:
create table public.ma_tx_mint (
id bigint
, quantity numeric
, tx_id bigint
, ident bigint
, primary key(id)
);
create table public.tx (
id bigint
, block_id bigint
, primary key(id)
);
create table public.block (
id bigint
, time timestamp with time zone
, primary key(id)
);
create table processed.last_update (
table_name varchar
, last_update timestamp with time zone
, primary key(table_name)
);
Explain Analyze
:
快:https://explain.depesz.com/s/lsq8
慢:https://explain.depesz.com/s/CavD
问题
Postgres有列统计,包括直方图边界。你的常量时间戳(fast)变量)似乎接近最近的几行,所以Postgres知道期望表block
中很少有符合条件的行。这个估计结果是正确的:
索引扫描使用idx_block_time在块上(cost=0.43..14.29rows=163width=8)(实际时间=0.825. 1.146rows=891loops=1)
与subselect抓取一个未知时间戳(slow)Postgres不知道对时间戳的期望和计划介于两者之间。不幸的是,你的表block
似乎有大约750万行,所以假设一个未知的过滤器Postgres期望大约1/3的行符合条件,即~ 250万:
索引扫描使用idx_block_time在块上(cost=0.43..127,268.65rows=2,491,327width=16)(实际时间=1.261..1.723rows=653loops=3)
所以Postgres计划了数百万个符合条件的行,并使用顺序扫描,这对于真正符合条件的少数行来说是一个糟糕的选择。
可能的解决方案如果>你是时间戳的下界,您可以将其添加为(逻辑冗余)附加谓词,以指导Postgres使用更合适的计划:
SELECT ident AS ma_id
, sum(quantity) AS quantity
, sum(quantity) FILTER (WHERE quantity > 0) AS quantity_minted
FROM public.ma_tx_mint m
WHERE EXISTS (
SELECT FROM public.tx
WHERE tx.id = m.tx_id
AND EXISTS (
SELECT FROM public.block b
WHERE b.id = tx.block_id
AND b.time >= (SELECT last_update FROM processed.last_update WHERE table_name = 'token_supply')
AND b.time >= '2022-06-01 00:00+0' -- !!! some known minimum bound
)
)
GROUP BY 1;
同样,因为table_name
是processed.last_update
的PK,我们知道子查询只返回单行,我们可以使用一个简单的标量子查询。应该已经快一点了
但重点是附加的最小界。如果有足够的选择性,Postgres将知道切换到索引扫描,就像你的快速计划一样。
除了将timestamp
常量转换为timestamptz
通常是一个坏主意:
'2022-06-02T17:45:43Z'::timestamptz
这将假定当前会话的时区,这可能是也可能不是预期的。而不是明确:
'2022-06-02T17:45:43Z'::timestamp AT TIME ZONE 'UTC'
'2022-06-02T17:45:43Z+0'::timestamptz
. .或者你想用的任何时区。看到:
- 在Rails和PostgreSQL中完全忽略时区
SELECT ident AS ma_id
, sum(quantity) AS quantity
, sum(quantity) FILTER (WHERE quantity > 0) AS quantity_minted
FROM public.ma_tx_mint m
WHERE EXISTS (
SELECT FROM public.tx
WHERE tx.id = m.tx_id
AND EXISTS (
SELECT FROM public.block b
WHERE b.id = tx.block_id
AND b.time >= (SELECT last_update FROM processed.last_update WHERE table_name = 'token_supply')
AND b.time >= '2022-06-01 00:00+0' -- !!! some known minimum bound
)
)
GROUP BY 1;
table_name
是processed.last_update
的PK,我们知道子查询只返回单行,我们可以使用一个简单的标量子查询。应该已经快一点了timestamp
常量转换为timestamptz
通常是一个坏主意:'2022-06-02T17:45:43Z'::timestamptz
'2022-06-02T17:45:43Z'::timestamp AT TIME ZONE 'UTC'
'2022-06-02T17:45:43Z+0'::timestamptz