我使用索引来加速这个Postgres查询吗?



下面是查询本身和EXPLAIN结果。我读了很多资料,但对我来说还是很复杂,我不明白我能不能用索引来加快速度。目前执行大约需要10分钟。如果有人有关于优化的建议,我将非常感激!

我升级到PostgresSQL 13,并使用pgtune.leopard.in.ua为我的服务器优化配置,现在运行这个大约需要60秒。好多了。但我还是很好奇我是否应该对它进行索引或者它是否会变得更快?

SELECT
p.name as playlist_name,
p.description as playlist_description,
o.user_id as owner_user_id,
o.display_name as owner_display_name,
percentile_disc(0.80) WITHIN GROUP (ORDER BY t.release_date) as percentile_release_date
FROM tracks t
JOIN playlists_tracks pt ON pt.track_id = t.id
JOIN playlists p ON pt.playlist_id = p.id
JOIN owners o ON p.owner_id = o.id
LEFT JOIN contacts c ON o.id = c.owner_id
WHERE t.user_id IS NOT NULL AND t.no_analysis IS NOT true AND c.owner_id IS NULL
GROUP BY p.id, o.user_id, o.display_name
QUERY PLAN
GroupAggregate  (cost=1941954.50..5592792.71 rows=25667715 width=136) (actual time=31191.535..59412.822 rows=248285 loops=1)
Group Key: p.id, o.user_id, o.display_name
Buffers: shared hit=175465 read=475303, temp read=732236 written=732286
->  Gather Merge  (cost=1941954.50..5015269.13 rows=25667715 width=136) (actual time=31191.000..47967.494 rows=43504313 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=175459 read=475302, temp read=732236 written=732286
->  Sort  (cost=1940954.44..1956996.76 rows=6416929 width=136) (actual time=31046.494..33961.990 rows=8700863 loops=5)
Sort Key: p.id, o.user_id, o.display_name
Sort Method: external merge  Disk: 1195696kB
Buffers: shared hit=175459 read=475302, temp read=732236 written=732286
Worker 0:  Sort Method: external merge  Disk: 1191408kB
Worker 1:  Sort Method: external merge  Disk: 1179872kB
Worker 2:  Sort Method: external merge  Disk: 1106536kB
Worker 3:  Sort Method: external merge  Disk: 1184376kB
->  Parallel Hash Join  (cost=485715.22..958481.32 rows=6416929 width=136) (actual time=8106.581..24000.603 rows=8700863 loops=5)
Hash Cond: (p.owner_id = o.id)
Buffers: shared hit=175279 read=475302
->  Parallel Hash Join  (cost=393044.42..825701.28 rows=6417848 width=76) (actual time=5358.248..18777.840 rows=8898524 loops=5)
Hash Cond: (pt.playlist_id = p.id)
Buffers: shared hit=175103 read=410111
->  Parallel Hash Join  (cost=177558.29..593368.29 rows=6417848 width=8) (actual time=4287.577..15642.345 rows=8898524 loops=5)
Hash Cond: (pt.track_id = t.id)
Buffers: shared read=410111
->  Parallel Seq Scan on playlists_tracks pt  (cost=0.00..381946.95 rows=12900195 width=8) (actual time=0.700..7507.414 rows=8909338 loops=5)
Buffers: shared read=252945
->  Parallel Hash  (cost=169739.28..169739.28 rows=625521 width=8) (actual time=4283.741..4283.744 rows=881202 loops=5)
Buckets: 8388608 (originally 4194304)  Batches: 1 (originally 1)  Memory Usage: 270752kB
Buffers: shared read=157166
->  Parallel Seq Scan on tracks t  (cost=0.00..169739.28 rows=625521 width=8) (actual time=2.441..3691.324 rows=881202 loops=5)
Filter: ((user_id IS NOT NULL) AND (no_analysis IS NOT TRUE))
Rows Removed by Filter: 11694
Buffers: shared read=157166
->  Parallel Hash  (cost=193051.06..193051.06 rows=1794806 width=72) (actual time=1062.430..1062.433 rows=1471675 loops=5)
Buckets: 8388608  Batches: 1  Memory Usage: 509184kB
Buffers: shared hit=175103
->  Parallel Seq Scan on playlists p  (cost=0.00..193051.06 rows=1794806 width=72) (actual time=0.074..342.586 rows=1471675 loops=5)
Buffers: shared hit=175103
->  Parallel Hash  (cost=82895.64..82895.64 rows=782012 width=68) (actual time=2739.889..2739.896 rows=1299732 loops=5)
Buckets: 8388608 (originally 4194304)  Batches: 1 (originally 1)  Memory Usage: 441920kB
Buffers: shared hit=56 read=65191
->  Hash Anti Join  (cost=24.08..82895.64 rows=782012 width=68) (actual time=591.483..1976.640 rows=1299732 loops=5)
Hash Cond: (o.id = c.owner_id)
Buffers: shared hit=56 read=65191
->  Parallel Seq Scan on owners o  (cost=0.00..72998.24 rows=782124 width=68) (actual time=0.166..1116.619 rows=1299993 loops=5)
Buffers: shared read=65177
->  Hash  (cost=18.48..18.48 rows=448 width=4) (actual time=591.292..591.294 rows=1309 loops=5)
Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 63kB
Buffers: shared hit=56 read=14
->  Seq Scan on contacts c  (cost=0.00..18.48 rows=448 width=4) (actual time=589.213..591.081 rows=1309 loops=5)
Buffers: shared hit=56 read=14
Planning:
Buffers: shared hit=232 read=10
Planning Time: 5.765 ms
JIT:
Functions: 189
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 29.901 ms, Inlining 346.416 ms, Optimization 1783.670 ms, Emission 813.419 ms, Total 2973.406 ms
Execution Time: 59598.360 ms

实际上是在选择所有数据,因此索引无法提供帮助。PostgreSQL的估计是正确的,所以这个计划可能是好的。

除了在RAM中缓存更多和获得更快的存储之外,您可以做的很少。也许设置jit=off会有一点不同。即使在查询中投入更多的核也不会有太大的帮助,因为大部分时间都花在收集结果和分组数据上。

最新更新