当添加限制子句时,插入-选择可以获得更好的计划



这是我正在运行的服务器

select version();
version
---------------------------------------------------------------------------    
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

我首先编写select(ext.t_event和ext.t_eevent_data是oracle_fdw(1.1版)从远程oracledb获取的两个外部表)

select 
te.id_data, 
te.id_device, 
te.date_write, 
te.date_event, 
ted.i_inout, 
ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
and te.id_data=ted.id_data;

获取整个记录集(3600条记录)大约需要10秒。

但后来我把选择变成了插入选择

insert into stg_data
select 
te.id_data, 
te.id_device, 
te.date_write, 
te.date_event, 
ted.i_inout, 
ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
and te.id_data=ted.id_data;

我被迫终止了这个查询,它已经运行了30多分钟!

经过几个小时的挣扎和绝望的尝试,我决定尝试这个

insert into stg_data
select 
te.id_data, 
te.id_device, 
te.date_write, 
te.date_event, 
ted.i_inout, 
ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
and te.id_data=ted.id_data
limit 5000;

而且。。。在20秒内,我将整个记录集存储在stgdata中。

为了更好地理解差异,我决定分析这些计划。

选择无限制

Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ r1."ID_DATA",
r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", r2."I_INOUT",
r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA") AND
(r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" < 
(CAST ('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) 
AND (r1."ID_DEVICE" = 2749651))

限制选择

Limit  (cost=10000.00..20000.00 rows=1000 width=548)
->  Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", 
r2."I_INOUT", r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER 
JOIN "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA")
AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS 
TIMESTAMP))) AND (r1."DATE_EVENT" < (CAST ('2019-01-17
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."ID_DEVICE" = 2749651))

因此,它基本上会向Oracle发送相同的查询,并在获取完成后立即在本地应用FILTER。

INSER-SELECT计划看起来一样吗?不!

带LIMIT 的INSERT_SELECT

Insert on stg_data_hist  (cost=10000.00..20010.00 rows=1000 width=548)
->  Limit  (cost=10000.00..20000.00 rows=1000 width=548)
->  Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ 
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", 
r1."DATE_EVENT", r2."I_INOUT", r2."VALUE" FROM 
("DISPATCH"."T_EVENT" r1 INNER JOIN 
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = 
r2."ID_DATA") AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."DATE_EVENT" < 
(CAST('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) AND 
(r1."ID_DEVICE" = 2749651))

INSERT-SELECT no LIMIT子句

Insert on stg_data_hist  (cost=30012.50..40190.00 rows=5000 width=548)
->  Hash Join  (cost=30012.50..40190.00 rows=5000 width=548)
Hash Cond: (te.id_data = ted.id_data)
->  Foreign Scan on t_event te  (cost=10000.00..20000.00 rows=1000 width=28)
Oracle query: SELECT /*93379c271b3f1bc08a1dbb94fb89f739*/ 
r3."ID_DATA", r3."ID_DEVICE", r3."DATE_WRITE", r3."DATE_EVENT" 
FROM "DISPATCH"."T_EVENT" r3 WHERE (r3."DATE_EVENT" >= 
(CAST ('2019-01-16 00:00:00.000000 AD' AS TIMESTAMP))) AND 
(r3."DATE_EVENT" < (CAST ('2019-01-17 00:00:00.000000 AD' AS 
TIMESTAMP))) AND (r3."ID_DEVICE" = 2749651)
->  Hash  (cost=20000.00..20000.00 rows=1000 width=528)
->  Foreign Scan on t_event_data ted  
(cost=10000.00..20000.00 rows=1000 width=528)
Oracle query: SELECT /*21c8741f2fa8a8d13d037c3191e8ac96*/ 
r4."ID_DATA", r4."I_INOUT", r4."VALUE" FROM 
"DISPATCH"."T_EVENT_DATA" r4

这就解释了为什么它比另一个花更长的时间。它从一个外部表中检索经过日期过滤的记录,从第二个外部表检索完整的记录集,并在本地进行联接。这需要很长时间!!这是几百万张唱片,而不是几千张。

最后是我的两个问题

1) 我想有第一个计划,但取消LIMIT条款(让我不寒而栗:-)。你会怎么做?除了join子句之外,我无意对ext.t_event_data应用过滤器。

2) 尽管两个SELECT计划看起来如此相似,为什么两个INSERT-SELECT计划看起来如此不同?

感谢阅读,祝度过美好的一天

计划器似乎认为它只会得到几千行,这显然是错误的,通过运行"ANALYZE ext.t_event"确保外部表的统计信息是最新的,ext.t_eevent_data也是如此,因为:

https://github.com/laurenz/oracle_fdw

PostgreSQL不会使用autovacuum守护进程自动收集外来表的统计信息。

请记住,分析Oracle外部表将导致完整的顺序表扫描。您可以使用表选项sample_percent,只使用Oracle表的一个示例来加快速度。

如果使用了limit,则在select情况下和insert情况下,联接被向下推送到Oracle,因此我能看到的没有在insert中使用limit的唯一原因是缺乏精确的表统计信息。您可以尝试将插入查询重写为CTE(由于明显的原因,尚未测试此查询):

WITH foreign_data AS (
select 
te.id_data, 
te.id_device, 
te.date_write, 
te.date_event, 
ted.i_inout, 
ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
and te.id_data=ted.id_data
)
insert into stg_data from foreign_data

您也可以尝试将查询重写为显式内部联接,而不是在where子句中使用联接条件(te.id_data=ted.id_data)。

相关内容

  • 没有找到相关文章

最新更新