强制 PG 将索引与时间范围一起使用。适用于较小的结果集,不适用于较大的结果集



已编辑:添加了解释分析


我有下表(例如简化(:

CREATE TABLE public.streamscombined
(
    eventtype text COLLATE pg_catalog."default",
    payload jsonb,
    clienttime bigint, //as millis from epoch
)

和B树化合物指数clienttime + eventtype

索引

修剪大量行时索引的正确用法

正确执行以下格式的查询会使用带有过滤大量文档的clienttime的索引。 例如:

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;

=>

Index Scan using "clienttime/type" on streamscombined  (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1) Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text)) Planning time: 0.208 ms Execution time: 13369.330 ms

结果流式传输结果 我看到数据在 100 毫秒内传入。


当索引修剪较少的行时忽略索引

但是,如果在放松clienttime条件时完全平坦,例如(增加 3 小时(:

explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;

=>

Gather Merge  (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
->  Sort  (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
->  Parallel Seq Scan on streamscombined  (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))

解释分析

Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3) Sort Key: clienttime Sort Method: external merge Disk: 722336kB -> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3) Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text)) Rows Removed by Filter: 7736119 Planning time: 0.109 ms Execution time: 76164.816 ms

结果流式传输结果我已经等了> 5 分钟没有任何结果。


这可能是因为 PG 认为索引不会对结果集进行太多修剪,因此它将使用不同的策略。

然而,这是关键,它似乎完全忽略了我想按clienttime订购的事实,而索引免费给我。

有没有办法强制 PG 使用独立于clienttime条件的实际值的索引?

排序结果很便宜,索引扫描很昂贵,因为它需要许多磁盘查找。

较低的ramdom_page_cost设置会降低索引扫描的成本估算,从而导致索引扫描用于较大的结果集。

相关内容

最新更新