如何基于常量表的行在 Postgres 中实现查询参数？

我有以下查询:

-- Really fast
-- Explain Analyze: https://explain.depesz.com/s/lsq8
with start_time as (
select '2022-06-02T17:45:43Z':: timestamp with time zone as time
)
-- 200x slower
-- Explain Analyze: https://explain.depesz.com/s/CavD
with start_time as (
select last_update as time
from processed.last_update
where table_name = 'token_supply'
)
select  ident as ma_id
,   sum(quantity) as quantity
,   sum(quantity) filter (where quantity > 0) as quantity_minted
from public.ma_tx_mint
where exists (
select id
from public.tx
where tx.id = ma_tx_mint.tx_id
and exists (
select id
from public.block
cross join start_time
where block.id = tx.block_id
and block.time >= start_time.time
)
)
group by ident

我正试图查询在指定时间后添加到表中的记录。如果时间像第一个start_time一样是硬编码的，那么查询将在0.2秒内运行。对于动态检索时间的第二个start_time，查询运行40秒。

我怎么能让Postgres处理这两种情况相同，动态查询ma_tx_mint表基于另一个表的行?

版本:PostgreSQL 13.6 on x86_64-pc-linux-gnu，由Debian clang version 12.0.1编译，64位

表:

create table public.ma_tx_mint (
id bigint
, quantity numeric
, tx_id bigint
, ident bigint
, primary key(id)
);
create table public.tx (
id bigint
, block_id bigint
, primary key(id)
);
create table public.block (
id bigint
, time timestamp with time zone
, primary key(id)
);
create table processed.last_update (
table_name varchar
, last_update timestamp with time zone
, primary key(table_name)
);

Explain Analyze:

快:https://explain.depesz.com/s/lsq8
慢:https://explain.depesz.com/s/CavD

问题

Postgres有列统计，包括直方图边界。你的常量时间戳(fast)变量)似乎接近最近的几行，所以Postgres知道期望表block中很少有符合条件的行。这个估计结果是正确的:

索引扫描使用idx_block_time在块上(cost=0.43..14.29rows=163width=8)(实际时间=0.825. 1.146rows=891loops=1)

与subselect抓取一个未知时间戳(slow)Postgres不知道对时间戳的期望和计划介于两者之间。不幸的是，你的表block似乎有大约750万行，所以假设一个未知的过滤器Postgres期望大约1/3的行符合条件，即~ 250万:

索引扫描使用idx_block_time在块上(cost=0.43..127,268.65rows=2,491,327width=16)(实际时间=1.261..1.723rows=653loops=3)

所以Postgres计划了数百万个符合条件的行，并使用顺序扫描，这对于真正符合条件的少数行来说是一个糟糕的选择。

可能的解决方案如果>你是时间戳的下界，您可以将其添加为(逻辑冗余)附加谓词，以指导Postgres使用更合适的计划:
SELECT ident AS ma_id , sum(quantity) AS quantity , sum(quantity) FILTER (WHERE quantity > 0) AS quantity_minted FROM public.ma_tx_mint m WHERE EXISTS ( SELECT FROM public.tx WHERE tx.id = m.tx_id AND EXISTS ( SELECT FROM public.block b WHERE b.id = tx.block_id AND b.time >= (SELECT last_update FROM processed.last_update WHERE table_name = 'token_supply') AND b.time >= '2022-06-01 00:00+0' -- !!! some known minimum bound ) ) GROUP BY 1;
同样，因为table_name是processed.last_update的PK，我们知道子查询只返回单行，我们可以使用一个简单的标量子查询。应该已经快一点了
但重点是附加的最小界。如果有足够的选择性，Postgres将知道切换到索引扫描，就像你的快速计划一样。
除了
将timestamp常量转换为timestamptz通常是一个坏主意:

'2022-06-02T17:45:43Z'::timestamptz
这将假定当前会话的时区，这可能是也可能不是预期的。而不是明确:

'2022-06-02T17:45:43Z'::timestamp AT TIME ZONE 'UTC' '2022-06-02T17:45:43Z+0'::timestamptz
. .或者你想用的任何时区。看到:
在Rails和PostgreSQL中完全忽略时区

问题

相关内容

最新更新

热门标签：