GreenPlum在join查询中选择了错误的查询计划



请原谅我的蹩脚英语

我有两个表在greenplum(版本是:PostgreSQL 9.4.20 (greenplum Database 6.0.0-beta.3))

一个表是:cookie_session

CREATE TABLE "ods_overall_cookie"."cookie_session" (
"site_cookie" varchar(80) COLLATE "pg_catalog"."default",
"createtime" timestamp(6),
"analyse_domain_cookie" varchar(30) COLLATE "pg_catalog"."default",
"id" int4 NOT NULL,
.... other fields....
) 
DISTRIBUTED by(analyse_domain_cookie)
;
CREATE INDEX "index_cookie_session_id" ON "ods_overall_cookie"."cookie_session" USING btree (
"id" "pg_catalog"."int4_ops" ASC NULLS LAST
);
CREATE INDEX "index_analysis_domain_cookie_btree" ON "ods_overall_cookie"."cookie_session" USING btree (
"analyse_domain_cookie" COLLATE "pg_catalog"."default" "pg_catalog"."text_ops" ASC NULLS LAST
);

和另一个表:ta202202

CREATE TABLE "ods_log"."ta202202" (
"id" serial8,
"uvcookie" varchar(50) COLLATE "pg_catalog"."default",
.... other fields ...
) distributed by (uvcookie)
;
CREATE INDEX "index_ta202202_id" ON "ods_log"."ta202202" USING btree (
"id" "pg_catalog"."int8_ops" ASC NULLS LAST
);

CREATE INDEX "indev_ta202202_uvcookie" ON "ods_log"."ta202202" USING btree (
"uvcookie" COLLATE "pg_catalog"."default" "pg_catalog"."text_ops" ASC NULLS LAST
);

这两个表分别有大约1亿个数据。
我的查询sql是:

select o.id,g.site_cookie 
from ods_log.ta202201  o 
join ods_overall_cookie.cookie_session  as g 
on g.analyse_domain_cookie  = o.uvcookie 
WHERE o.ID BETWEEN 20000000 and 20000077;

此查询在0.14秒内返回,解释ANALYZE结果为:

Gather Motion 24:1  (slice1; segments: 24)  (cost=0.00..434.40 rows=1 width=41) (actual time=1.785..4.098 rows=552 loops=1)
->  Nested Loop  (cost=0.00..434.40 rows=1 width=41) (actual time=0.225..1.948 rows=276 loops=1)
Join Filter: true
->  Index Scan using index_ta202201_id on ta202201  (cost=0.00..6.02 rows=3 width=25) (actual time=0.100..0.142 rows=8 loops=1)
Index Cond: ((id >= 20000000) AND (id <= 20000077))
->  Index Scan using index_analysis_domain_cookie_btree on cookie_session  (cost=0.00..428.38 rows=1 width=33) (actual time=0.013..0.213 rows=34 loops=8)
Index Cond: ((analyse_domain_cookie)::text = (ta202201.uvcookie)::text)
Planning time: 59.930 ms
(slice0)    Executor memory: 216K bytes.
(slice1)    Executor memory: 156K bytes avg x 24 workers, 156K bytes max (seg0).
(slice2)    
Memory used:  128000kB
Optimizer: Pivotal Optimizer (GPORCA) version 3.39.0
Execution time: 26.725 ms

似乎使用了嵌套循环

但是当我在where条件中增加ID范围时,即使只加1,如:0。ID在20000000和20000078之间,时间消耗变为25秒,增加200倍

Gather Motion 24:1  (slice1; segments: 24)  (cost=0.00..437.02 rows=1 width=41) (actual time=10266.694..23884.316 rows=557 loops=1)
->  Hash Join  (cost=0.00..437.02 rows=1 width=41) (actual time=12256.944..23881.566 rows=276 loops=1)
Hash Cond: ((ta202201.uvcookie)::text = (cookie_session.analyse_domain_cookie)::text)
Extra Text: (seg0)   Initial batch 0:
(seg0)     Wrote 874907K bytes to inner workfile.
(seg0)     Wrote 1K bytes to outer workfile.
(seg0)   Overflow batches 1..7:
(seg0)     Read 1200209K bytes from inner workfile: 171459K avg x 7 nonempty batches, 335761K max.
(seg0)     Wrote 766456K bytes to inner workfile: 127743K avg x 6 overflowing batches, 304587K max.
(seg0)     Read 1K bytes from outer workfile: 1K avg x 4 nonempty batches, 1K max.
(seg0)     Wrote 1K bytes to outer workfile.
(seg0)   Secondary Overflow batches 8..32767:
(seg0)     Read 2014970K bytes from inner workfile: 9201K avg x 219 nonempty batches, 258871K max.
(seg0)     Wrote 1573816K bytes to inner workfile: 12107K avg x 130 overflowing batches, 247277K max.
(seg0)     Read 1K bytes from outer workfile.
(seg0)   Hash chain length 4.2 avg, 4645100 max, using 3735148 of 59506688 buckets.  Skipped 32541 empty batches.
->  Index Scan using index_ta202201_id on ta202201  (cost=0.00..6.02 rows=4 width=25) (actual time=0.380..0.428 rows=8 loops=1)
Index Cond: ((id >= 20000000) AND (id <= 20000078))
->  Hash  (cost=431.00..431.00 rows=1 width=51) (actual time=12253.540..12253.540 rows=15647864 loops=1)
->  Seq Scan on cookie_session  (cost=0.00..431.00 rows=1 width=51) (actual time=0.058..5175.550 rows=15647865 loops=1)
Planning time: 62.416 ms
(slice0)    Executor memory: 184K bytes.
* (slice1)    Executor memory: 245659K bytes avg x 24 workers, 375566K bytes max (seg0).  Work_mem: 290371K bytes max, 1149907K bytes wanted.
Memory used:  128000kB
Memory wanted:  1150306kB
Optimizer: Pivotal Optimizer (GPORCA) version 3.39.0
Execution time: 23927.425 ms

My query plain fromNested Loop改为Hash Join似乎greenplum选择了一个糟糕的查询计划。我继续调整我的between条件,结果是:

<表类>与计划速度tbody><<tr>2000000020000077嵌套循环快2000000020000078散列连接缓慢2000000120000078嵌套循环快2000000120000079散列连接缓慢2000000220000079嵌套循环快3000000030000068嵌套循环快3000000030000069散列连接缓慢3000000130000069嵌套循环快

根据凯文的回答,执行

SET optimizer_enable_hashjoin = OFF;
SET optimizer_enable_tablescan = OFF;

成功了,问题终于解决了

最新更新