postgresql改进了查询扫描/过滤

我有一个不同对象的属性表

create table attributes(id serial primary key,
object_id int, 
attribute_id text,
text_data text,
int_data int,
timestamp_data timestamp,
state text default 'active');

一个对象将有不同类型的属性，属性值将在text_data或int_data或timestamp_data之间的一列中，取决于属性数据类型。

样本数据在这里

我想检索记录，我的查询是

select * from attributes
where attribute_id = 55 and state='active' 
order by text_data

速度非常慢。

将当前会话的work_mem增加到1gb。使用set命令SET work_mem TO '1 GB';改进从external merge Disk到quicksort的排序方法

但查询执行没有改善。查询执行计划

Gather Merge  (cost=750930.58..1047136.19 rows=2538728 width=128) (actual time=18272.405..27347.556 rows=3462116 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=235635 read=201793
->  Sort  (cost=749930.56..753103.97 rows=1269364 width=128) (actual time=14299.222..15494.774 rows=1154039 loops=3)
Sort Key: text_data
Sort Method: quicksort  Memory: 184527kB
Worker 0:  Sort Method: quicksort  Memory: 266849kB
Worker 1:  Sort Method: quicksort  Memory: 217050kB
Buffers: shared hit=235635 read=201793
->  Parallel Seq Scan on attributes   (cost=0.00..621244.50 rows=1269364 width=128) (actual time=0.083..3410.570 rows=1154039 loops=3)
Filter: ((attribute_id = 185) AND (state = 'active'))
Rows Removed by Filter: 8652494
Buffers: shared hit=235579 read=201793
Planning Time: 0.453 ms
Execution Time: 29135.285 ms

查询在45秒内的总运行时间

Successfully run. Total query runtime: 45 secs 237 msec.
3462116 rows affected.

为了提高过滤和查询的执行时间，在attribute_id和state上创建索引create index attribute_id_state on attributes(attribute_id,state);

Sort  (cost=875797.49..883413.68 rows=3046474 width=128) (actual time=47189.534..49035.361 rows=3462116 loops=1)
Sort Key: text_data
Sort Method: quicksort  Memory: 643849kB
Buffers: shared read=406048
->  Bitmap Heap Scan on attributes   (cost=64642.80..547711.91 rows=3046474 width=128) (actual time=981.857..10348.441 rows=3462116 loops=1)
Recheck Cond: ((attribute_id = 185) AND (state = 'active'))
Heap Blocks: exact=396586
Buffers: shared read=406048
->  Bitmap Index Scan on attribute_id_state  (cost=0.00..63881.18 rows=3046474 width=0) (actual time=751.909..751.909 rows=3462116 loops=1)
Index Cond: ((attribute_id = 185) AND (state = 'active'))
Buffers: shared read=9462
Planning Time: 0.358 ms
Execution Time: 50388.619 ms

但是创建索引后查询变得很慢。

表有2950万行。在900万行中text_data为null。查询返回近300万条记录，占表的10%。

是否有其他索引或其他方法，如改变参数等来改善查询?

一些建议:

ORDER BY子句可以通过索引加速。因此，如果你把排序列放在复合索引中，你可能会得到更快的东西。
```
CREATE INDEX attribute_id_state_data
ON attributes(attribute_id, state, text_data);
```
这个索引与你的问题中的索引是冗余的，所以在创建这个索引时删除那个索引。
您使用SELECT *，这是一个臭名昭著的性能和可维护性反模式。最好为您想要的列命名。当结果集很大时，这一点尤其重要:为什么要在应用程序中不需要的数据上浪费CPU和网络资源呢?假设你想这么做。如果您不需要所有这些列，请从这个SELECT中删除其中的一些。
```
SELECT id, object_id, attribute_id, text_data, int_data,
timestamp_data, state ...
```
你可以在你的索引上使用INCLUDE子句，所以它覆盖了你的查询，也就是说，查询可以完全从索引中得到满足。
```
CREATE INDEX attribute_id_state_data
ON attributes(attribute_id, state, text_data)
INCLUDE (id, object_id, int_data, timestamp_data, state)
```
当您使用这个BTREE索引时，您的查询通过随机访问索引到第一个符合条件的行，然后依次扫描索引来满足。PostgreSQL不需要反弹到表的数据。对于一个大的结果集来说，没有比这更快的了。
如果你从SELECT子句中删除了一些列，你也可以从索引的INCLUDE子句中删除它们。
你ORDER BY一个大对象的TEXT列。无论是在索引创建期间还是在查询期间，都需要对每个记录中的大量数据进行排序。它是离线存储的，所以速度没那么快。您是否可以修改您的应用程序，以使用有限长度的VARCHAR列来代替?这样会更有效率。

相关内容

最新更新

热门标签：