已编辑:添加了解释分析
我有下表(例如简化(:
CREATE TABLE public.streamscombined
(
eventtype text COLLATE pg_catalog."default",
payload jsonb,
clienttime bigint, //as millis from epoch
)
和B树化合物指数clienttime + eventtype
修剪大量行时索引的正确用法
正确执行以下格式的查询会使用带有过滤大量文档的clienttime
的索引。 例如:
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= 1522550900000 order by clienttime;
=>
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1781593.82 rows=1135725 width=583)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
解释分析
Index Scan using "clienttime/type" on streamscombined (cost=0.56..1711616.01 rows=1079021 width=592) (actual time=1.369..13069.861 rows=1074896 loops=1)
Index Cond: ((clienttime <= '1522540900000'::bigint) AND (eventtype = 'typeA'::text))
Planning time: 0.208 ms
Execution time: 13369.330 ms
结果:流式传输结果 我看到数据在 100 毫秒内传入。
当索引修剪较少的行时忽略索引
但是,如果在放松clienttime
条件时完全平坦,例如(增加 3 小时(:
explain SELECT * FROM streamscombined WHERE eventtype='typeA' AND clienttime <= (1522540900000 + (3*3600*1000)) order by clienttime;
=>
Gather Merge (cost=2897003.10..3192254.78 rows=2530552 width=583)
Workers Planned: 2
-> Sort (cost=2896003.07..2899166.26 rows=1265276 width=583)
Sort Key: clienttime
-> Parallel Seq Scan on streamscombined (cost=0.00..2110404.89 rows=1265276 width=583)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
解释分析
Gather Merge (cost=2918263.39..3193771.83 rows=2361336 width=592) (actual time=72505.138..75142.127 rows=2852704 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=2917263.37..2920215.04 rows=1180668 width=592) (actual time=70764.052..71430.200 rows=950901 loops=3)
Sort Key: clienttime
Sort Method: external merge Disk: 722336kB
-> Parallel Seq Scan on streamscombined (cost=0.00..2176719.08 rows=1180668 width=592) (actual time=0.451..57458.888 rows=950901 loops=3)
Filter: ((clienttime <= '1522551700000'::bigint) AND (eventtype = 'typeA'::text))
Rows Removed by Filter: 7736119
Planning time: 0.109 ms
Execution time: 76164.816 ms
结果:流式传输结果我已经等了> 5 分钟没有任何结果。
这可能是因为 PG 认为索引不会对结果集进行太多修剪,因此它将使用不同的策略。
然而,这是关键,它似乎完全忽略了我想按clienttime
订购的事实,而索引免费给我。
有没有办法强制 PG 使用独立于clienttime
条件的实际值的索引?
排序结果很便宜,索引扫描很昂贵,因为它需要许多磁盘查找。
较低的ramdom_page_cost设置会降低索引扫描的成本估算,从而导致索引扫描用于较大的结果集。