查询执行按类别过滤的产品评论(200万行)的选择。每个类别都很大,因此它可能包含200K个唯一的product_ids。当选择评论(按类别过滤)时,我们必须从一个大表(200M行)中处理一个大子集(200K行)。
查询例子
explain analyze
select r.*
from (select distinct on (r.product_id) r.*
from reviews as r
join review_product_info as pi on pi.product_id = r.product_id
where (pi.categories @> array['4572']::varchar(256)[] and r.publication_status in ('PUBLISHED'::review_publication_status_enum))
order by r.product_id, r.likes desc, r.created desc) as r
order by r.created desc
offset 0 rows fetch next 21 rows only;
Limit (cost=309701.99..309702.04 rows=21 width=899) (actual time=1594.890..1743.813 rows=21 loops=1)
-> Sort (cost=309701.99..309785.83 rows=33534 width=899) (actual time=1577.723..1726.642 rows=21 loops=1)
Sort Key: r.created DESC
Sort Method: top-N heapsort Memory: 39kB
-> Subquery Scan on r (cost=1003.71..308797.86 rows=33534 width=899) (actual time=42.000..1686.791 rows=76098 loops=1)
-> Unique (cost=1003.71..308462.52 rows=33534 width=3644) (actual time=41.995..1660.933 rows=76098 loops=1)
-> Gather Merge (cost=1003.71..306869.56 rows=637184 width=3644) (actual time=41.991..1596.437 rows=389237 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Incremental Sort (cost=3.65..229974.75 rows=159296 width=3644) (actual time=17.795..677.632 rows=77847 loops=5)
Sort Key: r_1.product_id, r_1.likes DESC, r_1.created DESC"
Presorted Key: r_1.product_id
Full-sort Groups: 1071 Sort Method: quicksort Average Memory: 55kB Peak Memory: 58kB
Pre-sorted Groups: 208 Sort Method: quicksort Average Memory: 106kB Peak Memory: 167kB
Worker 0: Full-sort Groups: 2096 Sort Method: quicksort Average Memory: 58kB Peak Memory: 68kB
Pre-sorted Groups: 437 Sort Method: quicksort Average Memory: 140kB Peak Memory: 173kB
Worker 1: Full-sort Groups: 2080 Sort Method: quicksort Average Memory: 63kB Peak Memory: 65kB
Pre-sorted Groups: 350 Sort Method: quicksort Average Memory: 101kB Peak Memory: 109kB
Worker 2: Full-sort Groups: 2047 Sort Method: quicksort Average Memory: 59kB Peak Memory: 68kB
Pre-sorted Groups: 357 Sort Method: quicksort Average Memory: 217kB Peak Memory: 240kB
Worker 3: Full-sort Groups: 2038 Sort Method: quicksort Average Memory: 65kB Peak Memory: 74kB
Pre-sorted Groups: 396 Sort Method: quicksort Average Memory: 104kB Peak Memory: 113kB
-> Nested Loop (cost=0.85..224299.36 rows=159296 width=3644) (actual time=10.717..508.580 rows=77847 loops=5)
-> Parallel Index Only Scan using reviews_categories_idx_test5 on review_product_info pi (cost=0.42..25169.46 rows=20157 width=7) (actual time=10.606..130.585 rows=16227 loops=5)
Filter: (categories @> '{4572}'::character varying(256)[])
Rows Removed by Filter: 24063
Heap Fetches: 39028
-> Index Scan using reviews_product_id_published_idx on reviews r_1 (cost=0.43..9.41 rows=47 width=899) (actual time=0.008..0.021 rows=5 loops=81134)
Index Cond: ((product_id)::text = (pi.product_id)::text)
Planning Time: 0.697 ms
JIT:
Functions: 36
Options: Inlining false, Optimization false, Expressions true, Deforming true"
Timing: Generation 7.050 ms, Inlining 0.000 ms, Optimization 3.928 ms, Emission
56.984 ms, Total 67.961 ms"
Execution Time: 1745.852 ms
我们要么使用专用的表来分类,要么在评论表中内联分类作为数组列,结果都是一样的,我们必须扫描表的大子集,然后用不同的on, sort, limit来减少它。
是否有机会可以优化这种类型的查询?
排序是昂贵的,您正在排序r。创建desc很多次了。从内部查询中删除多余的排序应该可以提高性能。