Hive Query,有什么好方法可以优化这些联合吗



ALL,我是HIVE和一般查询优化的新手。

我有3个联合或多或少都是完全相同的查询。这些联合存在的唯一原因是,我的源表没有周末或假日日期,并且我需要为不存在的假日/周末日期保留源表中存在的前一个日历日的一些基本值。Dateadd函数实际上是3个联合(1、2或3天(中唯一的区别符

有没有办法将这3个查询合并为一个,或者只是以一种更具性能的方式来实现这一点?

我有点困了,但我已经从45分钟的整个过程减少到了4分半钟。只是不知道如何优化这些工会。请帮助:/

UNION ALL 
--ADDING 1 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,1) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL 
--ADDING 2 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,2) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL 
--ADDING 3 DAYS To Holidays
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,3) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000')
) a
where row_no= 1

如果它与您所写的完全一样,唯一的区别是date_add参数函数,那么您可以从其中一个并集中获取sql,并将其与1、2和3个常量之间的并集交叉连接。也许交叉连接会比工会更好;还取决于来源的数字。此外,您可以在执行交叉联接之前过滤行号,以便联接更少的行。在下面发布的示例中,我没有过滤行号。

查询将如下所示:

SELECT a.portfolio_name, 
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
performance_end_date, 
a.car_return, 
a.nav, 
a.nav_id, 
a.performance_end_date, 
a.row_no 
FROM   (SELECT a.portfolio_name, 
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
Cast(0.0000000 AS STRING)   AS car_return, 
a.nav, 
a.nav_id, 
a.performance_end_date, 
Row_number() 
OVER ( 
partition BY a.portfolio_code, a.performance_end_date 
ORDER BY a.nav_id DESC) AS row_no 
FROM   carsales a 
WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
AND a.year = 2020 
AND a.month = 09 
AND Dayofweek(performance_end_date) = 6) a 
CROSS JOIN (SELECT 1 crs 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 3) crs 

编辑1:关于日期1或日期2的评论,你可以像你写的那样做得很好。在where子句中,输入date_column=something或date_columl=something。

SELECT a.portfolio_name, 
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
performance_end_date, 
a.car_return, 
a.nav, 
a.nav_id, 
a.performance_end_date, 
a.row_no 
FROM   (SELECT a.portfolio_name, 
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
Cast(0.0000000 AS STRING)   AS car_return, 
a.nav, 
a.nav_id, 
a.performance_end_date, 
Row_number() 
OVER ( 
partition BY a.portfolio_code, a.performance_end_date 
ORDER BY a.nav_id DESC) AS row_no 
FROM   carsales a 
WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
AND a.year = 2020 
AND a.month = 09 
AND (Dayofweek(performance_end_date) = 6 or performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000'))
) a 
CROSS JOIN (SELECT 1 crs 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 3) crs

除了@F.Lazarescu答案,您还可以重写CROSS JOIN子查询。

取而代之的是:

CROSS JOIN (SELECT 1 crs 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 3) crs 

使用stack()UDTF,它将执行得更快:

CROSS JOIN (SELECT stack(3, 1,2,3) as crs) crs 

最新更新