Postgres优化选择大子集



查询执行按类别过滤的产品评论(200万行)的选择。每个类别都很大,因此它可能包含200K个唯一的product_ids。当选择评论(按类别过滤)时,我们必须从一个大表(200M行)中处理一个大子集(200K行)。

查询例子

explain analyze
select r.*
from (select distinct on (r.product_id) r.*
from reviews as r
join review_product_info as pi on pi.product_id = r.product_id
where (pi.categories @> array['4572']::varchar(256)[] and r.publication_status in ('PUBLISHED'::review_publication_status_enum))
order by r.product_id, r.likes desc, r.created desc) as r
order by r.created desc
offset 0 rows fetch next 21 rows only;
Limit  (cost=309701.99..309702.04 rows=21 width=899) (actual time=1594.890..1743.813 rows=21 loops=1)
->  Sort  (cost=309701.99..309785.83 rows=33534 width=899) (actual time=1577.723..1726.642 rows=21 loops=1)
Sort Key: r.created DESC
Sort Method: top-N heapsort  Memory: 39kB
->  Subquery Scan on r  (cost=1003.71..308797.86 rows=33534 width=899) (actual time=42.000..1686.791 rows=76098 loops=1)
->  Unique  (cost=1003.71..308462.52 rows=33534 width=3644) (actual time=41.995..1660.933 rows=76098 loops=1)
->  Gather Merge  (cost=1003.71..306869.56 rows=637184 width=3644) (actual time=41.991..1596.437 rows=389237 loops=1)
Workers Planned: 4
Workers Launched: 4
->  Incremental Sort  (cost=3.65..229974.75 rows=159296 width=3644) (actual time=17.795..677.632 rows=77847 loops=5)
Sort Key: r_1.product_id, r_1.likes DESC, r_1.created DESC"
Presorted Key: r_1.product_id
Full-sort Groups: 1071  Sort Method: quicksort  Average Memory: 55kB  Peak Memory: 58kB
Pre-sorted Groups: 208  Sort Method: quicksort  Average Memory: 106kB  Peak Memory: 167kB
Worker 0:  Full-sort Groups: 2096  Sort Method: quicksort  Average Memory: 58kB  Peak Memory: 68kB
Pre-sorted Groups: 437  Sort Method: quicksort  Average Memory: 140kB  Peak Memory: 173kB
Worker 1:  Full-sort Groups: 2080  Sort Method: quicksort  Average Memory: 63kB  Peak Memory: 65kB
Pre-sorted Groups: 350  Sort Method: quicksort  Average Memory: 101kB  Peak Memory: 109kB
Worker 2:  Full-sort Groups: 2047  Sort Method: quicksort  Average Memory: 59kB  Peak Memory: 68kB
Pre-sorted Groups: 357  Sort Method: quicksort  Average Memory: 217kB  Peak Memory: 240kB
Worker 3:  Full-sort Groups: 2038  Sort Method: quicksort  Average Memory: 65kB  Peak Memory: 74kB
Pre-sorted Groups: 396  Sort Method: quicksort  Average Memory: 104kB  Peak Memory: 113kB
->  Nested Loop  (cost=0.85..224299.36 rows=159296 width=3644) (actual time=10.717..508.580 rows=77847 loops=5)
->  Parallel Index Only Scan using reviews_categories_idx_test5 on review_product_info pi  (cost=0.42..25169.46 rows=20157 width=7) (actual time=10.606..130.585 rows=16227 loops=5)
Filter: (categories @> '{4572}'::character varying(256)[])
Rows Removed by Filter: 24063
Heap Fetches: 39028
->  Index Scan using reviews_product_id_published_idx on reviews r_1  (cost=0.43..9.41 rows=47 width=899) (actual time=0.008..0.021 rows=5 loops=81134)
Index Cond: ((product_id)::text = (pi.product_id)::text)
Planning Time: 0.697 ms
JIT:
Functions: 36
Options: Inlining false, Optimization false, Expressions true, Deforming true"
Timing: Generation 7.050 ms, Inlining 0.000 ms, Optimization 3.928 ms, Emission 
56.984 ms, Total 67.961 ms"
Execution Time: 1745.852 ms

我们要么使用专用的表来分类,要么在评论表中内联分类作为数组列,结果都是一样的,我们必须扫描表的大子集,然后用不同的on, sort, limit来减少它。

是否有机会可以优化这种类型的查询?

排序是昂贵的,您正在排序r。创建desc很多次了。从内部查询中删除多余的排序应该可以提高性能。

最新更新