有可能加速这个postgres/sql(读取)查询吗



我有下表:

CREATE TABLE mosmix_data (
id SERIAL PRIMARY KEY, 
created_at TIMESTAMP without time zone default (now() at time zone 'utc'),
TimeStep timestamp, 
name char(8),
description text, 
PPPP float8, 
E_PPP float8, TX float8, TTT float8, E_TTT float8, Td float8,
E_Td float8, TN float8, TG float8, TM float8, T5cm float8,
DD float8, E_DD float8, FF float8, E_FF float8, FX1 float8,
FX3 float8, FX625 float8, FX640 float8, FX655 float8, FXh float8,
FXh25 float8, FXh40 float8, FXh55 float8, N float8, Neff float8,
Nlm float8, Nh float8, Nm float8, Nl float8, N05 float8, VV float8,
VV10 float8, wwM float8, wwM6 float8, wwMh float8, wwMd float8,
ww float8, ww3 float8, W1W2 float8, wwP float8, wwP6 float8,
wwPh float8, wwPd float8, wwZ float8, wwZ6 float8, wwZh float8,
wwD float8, wwD6 float8, wwDh float8, wwC float8, wwC6 float8,
wwCh float8, wwT float8, wwT6 float8, wwTh float8, wwTd float8,
wwS float8, wwS6 float8, wwSh float8, wwL float8, wwL6 float8,
wwLh float8, wwF float8, wwF6 float8, wwFh float8, DRR1 float8,
RR6c float8, RRhc float8, RRdc float8, RR1c float8, RRS1c float8,
RRL1c float8, RR3c float8, RRS3c float8, R101 float8, R102 float8,
R103 float8, R105 float8, R107 float8, R110 float8, R120 float8,
R130 float8, R150 float8, RR1o1 float8, RR1w1 float8, RR1u1 float8,
R600 float8, R602 float8, R610 float8, R650 float8, Rh00 float8,
Rh02 float8, Rh10 float8, Rh50 float8, Rd00 float8, Rd02 float8,
Rd10 float8, Rd50 float8, SunD float8, RSunD float8, PSd00 float8,
PSd30 float8, PSd60 float8, RRad1 float8, Rad1h float8, SunD1 float8,
SunD3 float8, PEvap float8, WPc11 float8, WPc31 float8, WPc61 float8,
WPch1 float8, WPcd1 float8,
Point Geometry(Point, 4326)
);

我还创建了以下索引:

CREATE INDEX ON mosmix_data using gist (Point);
CREATE INDEX name_idx ON mosmix_data(name);

首先,我选择了一个靠近中/长位置的气象站

select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1;

结果如下:

name   
----------
G385   
(1 row)

然后,我根据之前查询的名称获得该站的数据:

select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name like '%G385%' 
and timestep between now() and now() + interval '4 days'  
order by timestep ASC;

并获取数据。

我能做些什么来加快查询速度吗?

以下是解释者:

>explain select name from mosmix_data where timestep > now() order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' limit 1;
QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
Limit  (cost=0.41..1.99 rows=1 width=17)
->  Index Scan using mosmix_data_point_idx on mosmix_data  (cost=0.41..2294305.65 rows=1453301 width=17)
Order By: (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
Filter: (timestep > now())
(4 rows)

>explain select timestep, name, Rh02, TX, TN from mosmix_data where name like '%G385%' and timestep between now() and now() + interval '4 days'  order by timestep ASC;
QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
Gather Merge  (cost=391389.46..391395.31 rows=48 width=41)
Workers Planned: 6
->  Sort  (cost=390389.36..390389.38 rows=8 width=41)
Sort Key: timestep
->  Parallel Seq Scan on mosmix_data  (cost=0.00..390389.24 rows=8 width=41)
Filter: ((name ~~ '%G385%'::text) AND (timestep >= now()) AND (timestep <= (now() + '4 days'::interval)))
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
(9 rows)

谢谢!

更新根据@Frank Heikens的请求添加了EXPLAIN(ANALYZE, VERBOSE, BUFFERS)

EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1;
   QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.41..2.00 rows=1 width=17) (actual time=1.549..1.550 rows=1 loops=1)
Output: name, ((point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry))
Buffers: shared hit=152
->  Index Scan using mosmix_data_point_idx on public.mosmix_data  (cost=0.41..2261401.29 rows=1427043 width=17) (actual time=1.547..1.547 rows=1 loops=1)
Output: name, (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
Order By: (mosmix_data.point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
Filter: (mosmix_data.timestep > now())
Rows Removed by Filter: 3
Buffers: shared hit=152
Planning Time: 0.201 ms
Execution Time: 1.585 ms
(11 rows)

Gather Merge  (cost=391380.39..391503.32 rows=1008 width=41) (actual time=169.483..195.812 rows=96 loops=1)
Output: timestep, name, rh02, tx, tn
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=384456
->  Sort  (cost=390380.29..390380.71 rows=168 width=41) (actual time=112.120..112.121 rows=14 loops=7)
Output: timestep, name, rh02, tx, tn
Sort Key: mosmix_data.timestep
Sort Method: quicksort  Memory: 25kB
Buffers: shared hit=384456
Worker 0:  actual time=82.637..82.638 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.527 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
Buffers: shared hit=19794
Worker 1:  actual time=112.582..112.586 rows=96 loops=1
Sort Method: quicksort  Memory: 32kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.420 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.794 ms, Total 14.211 ms
Buffers: shared hit=40594
Worker 2:  actual time=82.642..82.643 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.526 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
Buffers: shared hit=19147
Worker 3:  actual time=112.605..112.606 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.400 ms, Inlining 0.000 ms, Optimization 1.013 ms, Emission 11.582 ms, Total 13.995 ms
Buffers: shared hit=40908
Worker 4:  actual time=112.606..112.607 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.401 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.395 ms, Total 13.794 ms
Buffers: shared hit=42131
Worker 5:  actual time=112.604..112.604 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.406 ms, Inlining 0.000 ms, Optimization 1.016 ms, Emission 11.835 ms, Total 14.258 ms
Buffers: shared hit=41171
->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..390374.08 rows=168 width=41) (actual time=103.656..112.018 rows=14 loops=7)
Timing: Generation 1.420 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.794 ms, Total 14.211 ms
Buffers: shared hit=40594
Worker 2:  actual time=82.642..82.643 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.526 ms, Inlining 0.000 ms, Optimization 1.551 ms, Emission 20.164 ms, Total 24.242 ms
Buffers: shared hit=19147
Worker 3:  actual time=112.605..112.606 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.400 ms, Inlining 0.000 ms, Optimization 1.013 ms, Emission 11.582 ms, Total 13.995 ms
Buffers: shared hit=40908
Worker 4:  actual time=112.606..112.607 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.401 ms, Inlining 0.000 ms, Optimization 0.998 ms, Emission 11.395 ms, Total 13.794 ms
Buffers: shared hit=42131
Worker 5:  actual time=112.604..112.604 rows=0 loops=1
Sort Method: quicksort  Memory: 25kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.406 ms, Inlining 0.000 ms, Optimization 1.016 ms, Emission 11.835 ms, Total 14.258 ms
Buffers: shared hit=41171
->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..390374.08 rows=168 width=41) (actual time=103.656..112.018 rows=14 loops=7)
Output: timestep, name, rh02, tx, tn
Filter: ((mosmix_data.name ~~ '%G385%'::text) AND (mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now() + '4 days'::interval)))
Rows Removed by Filter: 210995
Buffers: shared hit=384234
Worker 0:  actual time=82.484..82.484 rows=0 loops=1
Buffers: shared hit=19757
Worker 1:  actual time=53.929..112.467 rows=96 loops=1
Buffers: shared hit=40557
Worker 2:  actual time=82.492..82.492 rows=0 loops=1
Buffers: shared hit=19110
Worker 3:  actual time=112.507..112.507 rows=0 loops=1
Buffers: shared hit=40871
Worker 4:  actual time=112.510..112.511 rows=0 loops=1
Buffers: shared hit=42094
Worker 5:  actual time=112.509..112.509 rows=0 loops=1
Buffers: shared hit=41134
Planning Time: 0.204 ms
JIT:
Functions: 28
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 11.963 ms, Inlining 0.000 ms, Optimization 8.332 ms, Emission 97.045 ms, Total 117.341 ms
Execution Time: 197.158 ms
(75 rows)

更新

所以我听从了每一条建议,我认为这实际上让查询变得更加简单:)

所以我添加了以下索引:

CREATE INDEX items_day_of_creation_idx ON mosmix_data ( (created_at::date) );
CREATE INDEX name_idx_gin ON mosmix_data USING GIN (name gin_trgm_ops);

我还将列name的数据类型更改为text。此外,我将查询更改为CTE:

WITH name AS (select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name = name 
and timestep between now() and now() + interval '4 days'  
order by timestep ASC;

我还安装了pg_trgm

以下是解释者:

mosmix=# EXPLAIN(ANALYZE, VERBOSE, BUFFERS) WITH name AS (select name 
from mosmix_data 
where timestep > now() 
order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' 
limit 1) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name = name 
and timestep between now() and now() + interval '4 days'  
order by timestep ASC;

Gather Merge  (cost=209808.68..279068.37 rows=572750 width=39) (actual time=282.161..387.589 rows=573312 loops=1)
Output: timestep, name, rh02, tx, tn
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=190898, temp read=3520 written=3539
->  Sort  (cost=208808.60..209094.98 rows=114550 width=39) (actual time=233.210..246.441 rows=95552 loops=6)
Output: timestep, name, rh02, tx, tn
Sort Key: mosmix_data.timestep
Sort Method: external merge  Disk: 5952kB
Buffers: shared hit=190898, temp read=3520 written=3539
Worker 0:  actual time=223.592..236.370 rows=90733 loops=1
Sort Method: external merge  Disk: 4456kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.453 ms, Inlining 0.000 ms, Optimization 1.033 ms, Emission 11.837 ms, Total 14.323 ms
Buffers: shared hit=29938, temp read=557 written=560
Worker 1:  actual time=223.059..238.316 rows=88448 loops=1
Sort Method: external merge  Disk: 4344kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.444 ms, Inlining 0.000 ms, Optimization 1.028 ms, Emission 11.926 ms, Total 14.398 ms
Buffers: shared hit=29751, temp read=543 written=546
Worker 2:  actual time=223.691..236.821 rows=91575 loops=1
Sort Method: external merge  Disk: 4496kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.465 ms, Inlining 0.000 ms, Optimization 1.035 ms, Emission 11.841 ms, Total 14.341 ms
Buffers: shared hit=30630, temp read=562 written=565
Worker 3:  actual time=223.555..236.986 rows=90809 loops=1
Sort Method: external merge  Disk: 4464kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.446 ms, Inlining 0.000 ms, Optimization 1.035 ms, Emission 11.837 ms, Total 14.318 ms
Buffers: shared hit=30770, temp read=558 written=561
Worker 4:  actual time=223.505..237.041 rows=90596 loops=1
Sort Method: external merge  Disk: 4448kB
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.447 ms, Inlining 0.000 ms, Optimization 1.034 ms, Emission 11.897 ms, Total 14.378 ms
Buffers: shared hit=30083, temp read=556 written=559
->  Parallel Seq Scan on public.mosmix_data  (cost=0.00..197348.43 rows=114550 width=39) (actual time=13.155..203.203 rows=95552 loops=6)
Output: timestep, name, rh02, tx, tn
Filter: ((mosmix_data.name IS NOT NULL) AND (mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now() + '4 days'::interval)))
Rows Removed by Filter: 150295
Buffers: shared hit=190713
Worker 0:  actual time=13.184..194.421 rows=90733 loops=1
Buffers: shared hit=29901
Worker 1:  actual time=13.227..195.179 rows=88448 loops=1
Buffers: shared hit=29714
Worker 2:  actual time=13.178..194.910 rows=91575 loops=1
Buffers: shared hit=30593
Planning Time: 0.307 ms
JIT:
Functions: 24
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 8.992 ms, Inlining 0.000 ms, Optimization 6.265 ms, Emission 70.878 ms, Total 86.135 ms
Execution Time: 407.614 ms
(66 rows)

更新

经济放缓的新原因是CTE。没有它,但有了其他措施,性能显著提高:

EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select name from mosmix_data where timestep > now() order by point <-> 'SRID=4326;POINT(11.29575262 47.9366981)' limit 1;
QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.29..1.71 rows=1 width=15) (actual time=0.738..0.738 rows=1 loops=1)
Output: name, ((point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry))
Buffers: shared hit=136
->  Index Scan using mosmix_data_point_idx on public.mosmix_data  (cost=0.29..2070165.86 rows=1456434 width=15) (actual time=0.736..0.736 rows=1 loops=1)
Output: name, (point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
Order By: (mosmix_data.point <-> '0101000020E61000006F362DE36C972640D8DC2CB9E5F74740'::geometry)
Filter: (mosmix_data.timestep > now())
Rows Removed by Filter: 2
Buffers: shared hit=136
Planning Time: 0.142 ms
Execution Time: 0.763 ms
(11 rows)

mosmix=# EXPLAIN(ANALYZE, VERBOSE, BUFFERS) select timestep, name, Rh02, TX, TN 
from mosmix_data 
where name='G385' 
and timestep between now() and now() + interval '4 days'  
order by timestep ASC;
QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=274.79..275.03 rows=97 width=39) (actual time=0.086..0.087 rows=0 loops=1)
Output: timestep, name, rh02, tx, tn
Sort Key: mosmix_data.timestep
Sort Method: quicksort  Memory: 25kB
Buffers: shared hit=3
->  Index Scan using name_idx on public.mosmix_data  (cost=0.43..271.59 rows=97 width=39) (actual time=0.080..0.080 rows=0 loops=1)
Output: timestep, name, rh02, tx, tn
Index Cond: (mosmix_data.name = 'G385'::text)
Filter: ((mosmix_data.timestep >= now()) AND (mosmix_data.timestep <= (now() + '4 days'::interval)))
Buffers: shared hit=3
Planning:
Buffers: shared hit=4
Planning Time: 0.455 ms
Execution Time: 0.118 ms
(14 rows)

简单的方法是在时间戳上创建一个索引,然后命名为:

CREATE INDEX my_idx ON mosmix_data(timestamp, name)

timestamp是索引第一列的一个很好的候选者,因为这是where子句中最严格的条件(在4天的范围内)。

通过将name也放在索引中,like条件将不会使用索引,因为第一个字符是未知的(它是%),可以对索引中的值执行,而不必从表中检索行。