如何从Hiveql中的select over语句中只提取最近一周



我需要一些帮助,我创建了一个查询,它保持一个元素相对于特定度量值是返回1还是0的运行总数,如果该度量值提供0,则运行总数将返回0,示例如下:

year_week  element  measure  running_total
2020_40    A        1        1
2020_41    A        1        2
2020_42    A        1        3
2020_43    A        0        0
2020_44    A        1        1
2020_45    A        1        2
2020_40    B        1        1
2020_41    B        1        2
2020_42    B        1        3
2020_43    B        1        4
2020_44    B        1        5
2020_45    B        1        6

以上是使用以下查询实现的:

SELECT element,
year_week,
measure,
SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM (
SELECT *,
SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
FROM (
SELECT *,
CASE WHEN  measure = 1 THEN 0 ELSE 1 END AS measure_flag
FROM database.table ) x ) y

这很好,很有效,但我只想提供每个元素的最新几周数据。因此,在上面的例子中,它将是:

year_week  element  measure  running_total
2020_45    A        1        2
2020_45    B        1        6

本质上,我需要保持逻辑不变,但限制返回的数据集。我已经尝试过了,但它将结果从正确的运行总数更改为1或0。

非常感谢您的帮助!

您可以添加另一个嵌套级别,并使用row_number()过滤每个element的最新记录。

我建议:

select element, year_week, measure, running_total
from (
select t.*,
row_number() over(partition by element, grp order by year_week) - 1 as running_total
from (
select t.*,
sum(1 - measure) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1

考虑到measure只有值01,我稍微简化了查询,如示例数据所示。如果不是这样,那么:

select element, year_week, measure, running_total
from (
select t.*,
sum(measure) over(partition by element, grp order by year_week) as running_total
from (
select t.*,
sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1

相关内容

  • 没有找到相关文章

最新更新