BigQuery按日期为分类列的每个值的最后5行筛选WHERE



如果标题有点冗长,请道歉-我将在下面创建一个示例来突出我所指的内容

t1

date        team    num_val
2017-10-04    ab          7  
2017-10-03    ab          6
2017-10-02    ab          8
2017-10-05    ab          3
2017-10-07    ab         12
2017-10-06    ab          3
2017-10-01    ab          5
2017-09-08    cd          4
2017-09-09    cd          8
2017-09-10    cd          2
2017-09-14    cd          1
2017-09-13    cd          5
2017-09-11    cd          6
2017-09-12    cd         13

有了这张表,我只想:

  • 筛选每个团队最近的5个日期
  • 按团队分组并对num_val列求和

足够简单。然而,每支球队的日期都没有韵律或理由(我不能简单地过滤最近的5个日期,因为每支球队可能不同(。我目前有以下查询框架:

SELECT
team, 
sum(num_val)
FROM t1
GROUP BY team

如果能帮助我们到达终点,我们将不胜感激,谢谢!!

BigQuery Standard SQL的选项很少,因此您可以看到不同的方法

选项#1

#standardSQL
SELECT team, SUM(num_val) sum_num FROM (
SELECT team, num_val, ROW_NUMBER() OVER(PARTITION BY team ORDER BY DATE DESC) pos
FROM `project.dataset.table`
)
WHERE pos <= 5
GROUP BY team

选项#2

#standardSQL
SELECT team, sum_num FROM (
SELECT team, 
SUM(num_val) OVER(PARTITION BY team ORDER BY DATE DESC ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING) AS sum_num, 
ROW_NUMBER() OVER(PARTITION BY team ORDER BY DATE DESC) pos
FROM `project.dataset.table`
)
WHERE pos = 1  

如果应用于您问题的样本数据-两者都产生以下结果

Row team    sum_num  
1   ab      31   
2   cd      27     

虽然以上选项在一些更复杂的情况下可能有用——在您的特定情况下——我会选择Filipe的答案中提供的选项(类似于一个(

#standardSQL
SELECT team, (SELECT SUM(num_val) FROM UNNEST(num_values)) sum_num
FROM (
SELECT team, ARRAY_AGG(STRUCT(num_val) ORDER BY DATE DESC LIMIT 5) num_values
FROM `project.dataset.table`
GROUP BY team
)

要获得每个的最新5个:

SELECT team, ARRAY_AGG(num_val ORDER BY date DESC LIMIT 5) arr
FROM x
GROUP BY team

然后CCD_ 2,并将这些num_vals相加。

SELECT team, (SELECT SUM(num_val) FROM UNNEST(arr) num_val) the_sum
FROM (previous)

最新更新