对于几乎相同的查询,有非常不同的查询计划



我正在使用PostgreSQL 10 + pg_trgm扩展。

表布局:

Column    |       Type        | Collation | Nullable |        Default        | Storage  |
--------------+-------------------+-----------+----------+-----------------------+----------+
id           | integer           |           | not null |                       | plain    | 
reps         | integer           |           |          | 1                     | plain    | 
user         | integer           |           |          |                       | plain    | 
ip           | character varying |           | not null | ''::character varying | extended | 
visittime    | integer           |           |          |                       | plain    | 
domain       | character varying |           |          |                       | extended | 
address      | text              |           |          |                       | extended | 
method       | character varying |           |          |                       | extended | 
mime         | character varying |           |          |                       | extended | 
duration     | integer           |           |          |                       | plain    | 
size         | bigint            |           |          |                       | plain    | 
req_status   | integer           |           |          |                       | plain    | 
http_status  | integer           |           |          |                       | plain    | 
xproxymeta   | integer           |           |          |                       | plain    | 
Indexes:
"http_requests_pkey" PRIMARY KEY, btree (id)
"http_trgm_idx" gin (address gin_trgm_ops)
"md_userid_idx" btree ("user" DESC)
"md_visittime_idx" btree (visittime DESC)
Foreign-key constraints:
"http_requests_user_fkey" FOREIGN KEY ("user") REFERENCES username_id(id)
Triggers:
add_occupied_space_record_num AFTER INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE add_occupied_space_record_num_func()
count_repeated_records BEFORE INSERT ON http_requests FOR EACH ROW EXECUTE PROCEDURE count_repeated_records_func()
delete_occupied_space_record_num AFTER DELETE ON http_requests FOR EACH ROW EXECUTE PROCEDURE delete_occupied_space_record_num_func()

请注意,在address列上有一个GIN Trigram全文索引。这个表目前有~ 1000万条记录。

现在这两个相同的查询会产生非常不同的计划。第一个需要56ms,第二个需要~25秒。

# explain analyze select * from http_requests where address ilike '%abc%' order by visittime desc limit 10;
QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.43..60.93 rows=10 width=209) (actual time=2.862..21.725 rows=10 loops=1)
->  Index Scan using md_visittime_idx on http_requests  (cost=0.43..500074.21 rows=82654 width=209) (actual time=2.861..21.719 rows=10 loops=1)
Filter: (address ~~* '%abc%'::text)
Rows Removed by Filter: 6663
Planning time: 0.279 ms
Execution time: 21.751 ms
(6 rows)

现在是相同的查询,只是搜索模式不同:xyz

# explain analyze select * from http_requests where address ilike '%xyz%' order by visittime desc limit 10;
QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=5246.47..5246.50 rows=10 width=209) (actual time=23367.849..23367.870 rows=10 loops=1)
->  Sort  (cost=5246.47..5248.52 rows=818 width=209) (actual time=23367.846..23367.848 rows=10 loops=1)
Sort Key: visittime DESC
Sort Method: top-N heapsort  Memory: 34kB
->  Bitmap Heap Scan on http_requests  (cost=2090.34..5228.79 rows=818 width=209) (actual time=17.202..23352.607 rows=18926 loops=1)
Recheck Cond: (address ~~* '%xyz%'::text)
Heap Blocks: exact=18243
->  Bitmap Index Scan on http_trgm_idx  (cost=0.00..2090.14 rows=818 width=0) (actual time=12.342..12.342 rows=18926 loops=1)
Index Cond: (address ~~* '%xyz%'::text)
Planning time: 0.190 ms
Execution time: 23368.164 ms
(11 rows)

为什么计划如此不同,我如何修复缓慢的查询?

PostgreSQL知道有很多行包含abc,但是很少包含xyz

因此,在第一种情况下,它正确地决定使用支持ORDER BY子句的索引,并丢弃不符合ILIKE条件的行,直到找到10个结果行。

这种策略在第二种情况下就不那么有效了,因为PostgreSQL在找到10个结果之前必须搜索表的更大一部分,所以它决定使用三元组索引并对少数结果进行排序。

不幸的是,PostgreSQL的估计是错误的(18926行而不是估计的818行),所以位图堆扫描需要很长时间。

尝试'ANALYZE'表,也许使用更高的default_statistics_target,以获得更好的估计。

如果所有都失败,将条件更改为类似

的内容
WHERE address || '' ILIKE '%abc%'

让PostgreSQL不使用三元组索引

最新更新