Snowflake临时表格排序



我正在尝试在CTE(WITH(和临时表之间进行性能比较。我已经将查询转换为已使用的Temp表,但看到这样一种情况:当作为一个整体运行时,执行失败,因为其中一个早期的临时表执行被取消。示例:

// Produces large result set
CREATE temporary table tt1 as
SELECT DISTINCT
t1.c1
FROM t1
//ADDITIONAL LOGIC
;
CREATE temporary table tt2 as
SELECT t2.c1
t2.c2
FROM t2;
CREATE temporary table tt3 as
SELECT t1.c1
FROM t1
INNER JOIN tt1
on t1.c1 = tt1.c1;

当tt3开始运行时,执行似乎被取消了,从tt3到tt1有一个依赖关系。我尝试过CTAS,也尝试过用标准的CREATE TABLE定义表,然后再做INSERT INTO,但它的结果是取消执行。

有没有一种方法可以通过标准sql语法更好地对这些进行排序,而不必遵循存储过程或任务的路线?

编辑[2]下面是一个使用行为相同的新冠病毒数据集的更简单场景。两种情况下的终止时间似乎都在35秒左右。因此,出于复制目的,如果您可以生成一个运行2分钟的查询,然后创建一个简单的后续临时表,从您应该能够复制的第一个表中进行选择。

CREATE TEMPORARY TABLE PLATFORM_EVALUATIONS.PUBLIC.TT1 AS
select a.COUNTRY_REGION
, a.PROVINCE_STATE
, a.COUNTY
, current_date() c_date
, count(*) counts
from PUBLIC.JHU_COVID_19_TIMESERIES a
inner join public.jhu_dashboard_covid_19_global b
on a.country_region = b.country_region
where 1=1
and a.COUNTRY_REGION = 'United States'
group by a.COUNTRY_REGION
, a.PROVINCE_STATE
, a.COUNTY;

CREATE TEMPORARY TABLE PLATFORM_EVALUATIONS.PUBLIC.TT2 AS
SELECT COUNTRY_REGION
, COUNTS
FROM PLATFORM_EVALUATIONS.PUBLIC.TT1;

第1版:工作历史屏幕截图您可以在屏幕截图中看到,_valid_barcodes(tt1(的持续时间较短(通常为2米14秒(,其他步骤在完成/终止之前开始。

// this is tt1 in my example it produces 3,657,599 rows in 2m 14s but is cancelled short when run as a whole.
CREATE temporary table business_vault_dev.dnr._valid_barcodes as
SELECT DISTINCT
e.Barcode
FROM BUSINESS_VAULT_DEV.DNR.DIM_EVENT_VW e
INNER JOIN BUSINESS_VAULT_DEV.DNR.DIM_PACKAGE_VW p
on e.BARCODE = COALESCE(p.Barcode, p.Barcode)
INNER JOIN BUSINESS_VAULT_DEV.DNR.DIM_ORDER_VW o
on p.order_key = o.order_key
INNER JOIN BUSINESS_VAULT_DEV.SCORECARD.DIM_CUSTOMER_VW cl
on o.CUSTOMER_ID=cl.CUSTOMER_ID
WHERE e.Event_Type='Delivered'
and o.service_code not in ('XD', 'PU')
and lower(cl.NAME) IN ('a', 'b', 'c')
GROUP BY e.Barcode
HAVING date(min(e.EST_Event_DateTime)) between '2020-09-05' and '2020-10-02';
// this is tt2 
CREATE temporary table business_vault_dev.dnr._driver_merge as 
select 
//redacted
from data_lake.driver.driver_table1
union
select 
//redacted
from data_lake.driver.driver_table2
union
select 
//redacted
from data_lake.driver.driver_table3
UNION
select 
//redacted
from data_lake.driver.driver_table4;
// this is unreferenced in my example but completes successfully and references the previous temp table    
CREATE temporary table business_vault_dev.dnr._driver as 
select 
// redacted
from business_vault_dev.dnr._driver_merge dm
group by dm.contractor_number;
// this is tt3 in my example
CREATE temporary table business_vault_dev.dnr._slef_dedup as 
SELECT
e.Barcode
// redacting columns here
FROM BUSINESS_VAULT_DEV.DNR.DIM_EVENT_VW e
INNER JOIN business_vault_dev.dnr._valid_barcodes vb
ON e.Barcode = vb.Barcode
LEFT JOIN data_lake.driver.driver_table1 cn
ON //redacted
LEFT JOIN data_lake.driver.driver_table2 pcp
ON //redacted
;

Daniel,我能够使用您的简单场景和COVID数据库在测试中重现这个问题。新的用户界面似乎有问题。它开始运行第二个CREATE TABLE命令,但没有完成第一个命令。请向Snowflake支持提交一张门票。

在提交此票证时生成并共享HAR文件,并包含SQL以进行复制,这将非常有用。

https://community.snowflake.com/s/article/How-to-generate-a-HAR-file-for-Web-Interface-Troubleshooting

最新更新