Snowflake:使用WITH子句而不是子查询连接两个表的最有效方式



这是我写的代码。只是想知道有没有办法让它更有效率。我已经将两个独立的表相互连接然后,我想将两个表的结果相互连接:

SELECT 
T1.CITY,
T1.TIMESTAMP,
T1.VALUE AS PARTICULATE_LEVEL,
T2.VALUE AS POLLUTION_LEVEL
FROM
(
WITH PARTICULATE_DATA (CITY,TIMESTAMP, VALUE) AS
(
SELECT LOCATION,
TIMESTAMP,
(MICRO_VALUE * 0.097) AS VALUE

FROM 
CONSOLIDATED_TABLE
)
select a.*,
max(b.VALUE) as MAX_PAR_PREVIOUS_24_HOURS,
from PARTICULATE_DATA as a
left join PARTICULATE_DATA as b
on a.CITY = b.CITY and b.TIMESTAMP between dateadd(day, -1, a.TIMESTAMP) and a.TIMESTAMP
group by 1,2,3,
order by 1,2
) T1
INNER JOIN 
(
WITH POLLUTION_DATA (CITY,TIMESTAMP, VALUE) AS
(
SELECT LOCATION,
TIMESTAMP,
(VALUE ) AS VALUE

FROM 
CONSOLIDATED_TABLE_2
)
select a.*,
max(b.VALUE) as MAX_POLLUTION_LEVEL_PREVIOUS_24_HOURS,
from POLLUTION_DATA as c
left join POLLUTION_DATA as d
on a.CITY = b.CITY and b.TIMESTAMP between dateadd(day, -1, a.TIMESTAMP) and a.TIMESTAMP
group by 1,2,3,
order by 1,2
) T2
ON T1.CITY = T2.CITY

现在有很多事情都是在浪费时间:

不应该出现的两个ORDER BY。如果必须自连接,则应该将DATEADD移动到CTE中,以改进该部分:

WITH particulate_data (city, timestamp, value) AS (
SELECT 
location,
timestamp,
dateadd(day, -1, timestamp) AS ts_m1day,
micro_value * 0.097 AS value
FROM consolidated_table
)
select 
a.location,
a.timestamp,
a.value,
max(b.value) AS max_par_previous_24_hours
FROM particulate_data AS a
LEFT JOIN particulate_data AS b
ON a.city = b.city 
AND b.timestamp BETWEEN a.ts_m1day AND a.TIMESTAMP
GROUP BY 1,2,3

然后在T1T2之间的连接上,您只在CITY上连接,但是对于每个T1时间戳,您将获得T2日期。现在可以清楚地看到,这两个数据源可能不会重叠。

但是,如果您将数据卷起来(截断)时间戳到日期/日期,那么您可以在MAX中聚合这些值,从而避免自连接,这有一个更简单的窗口,"这个"天"的最大值"与每个记录的最后24小时。

如何写SQL:

WITH particulate_data AS (
SELECT 
location as city,
timestamp::date as day,
max(micro_value * 0.097) AS max_par
FROM consolidated_table
GROUP BY 1,2
), pollution_data AS (
SELECT 
location as city,
timestamp::date as day,
max(VALUE) as max_pollution_level
FROM consolidated_table_2
GROUP BY 1,2
)
SELECT 
t1.city,
t1.day,
t1.value AS particulate_level,
t2.value AS pollution_level
FROM particulate_data AS t1
JOIN pollution_data AS T2
ON t1.city = t2.city AND t1.day = t2.day

最新更新