如何计算 BigQuery 上事件之间天数差的滚动平均值?



>我有一个事件表,如下所示:

date                  event_category     event_planner
2019-09-22T00:00:00   soccer_night       Marcus
2019-09-25T00:00:00   comedy_night       John
2019-09-28T00:00:00   dance_party        John
2019-10-02T00:00:00   soccer_night       Marcus

这里的想法是获取每个计划员日期之间差异的滚动平均值。 到目前为止,我有按类别分隔的每个计划器的距离(以天为单位(,如下所示:DATE_DIFF(SAFE_CAST(date AS date),LAG(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner ORDER BY date), day) AS result

我期望的是这样的:

date                  event_category     event_planner     rolling_avg
2019-09-22T00:00:00   soccer_night       Marcus            0
2019-09-25T00:00:00   comedy_night       John              0
2019-09-28T00:00:00   comedy_night       John              3
2019-10-02T00:00:00   soccer_night       Marcus            10
2019-10-10T00:00:00   comedy_night       John              7

下面是 BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(day, diff), IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day 
FROM `project.dataset.table`
)
)

是否适用于问题中的样本数据

WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-09-22T00:00:00' date, 'soccer_night' event_category, 'Marcus' event_planner UNION ALL
SELECT '2019-09-25T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-09-28T00:00:00', 'comedy_night', 'John' UNION ALL
SELECT '2019-10-02T00:00:00', 'soccer_night', 'Marcus' UNION ALL
SELECT '2019-10-10T00:00:00', 'comedy_night', 'John' 
)

结果是

Row date                    event_category  event_planner   rolling_avg  
1   2019-09-22 00:00:00 UTC soccer_night    Marcus          0    
2   2019-09-25 00:00:00 UTC comedy_night    John            0    
3   2019-09-28 00:00:00 UTC comedy_night    John            3.0  
4   2019-10-02 00:00:00 UTC soccer_night    Marcus          10.0     
5   2019-10-10 00:00:00 UTC comedy_night    John            7.5    

我应该如何修改以使用同一计划者最近三个相同类型的事件的平均值?

#standardSQL
SELECT * EXCEPT(day, diff), 
IFNULL(AVG(diff) OVER(PARTITION BY event_category, event_planner ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 0) rolling_avg
FROM (
SELECT *, DATE_DIFF(day, LAG(day) OVER(PARTITION BY event_category, event_planner ORDER BY day), DAY) diff
FROM (
SELECT *, SAFE_CAST(date AS DATE) AS day 
FROM `project.dataset.table`
)
)

您可以使用lag()计算子查询中的最后一个日期,然后在外部查询中执行滚动平均值:

select
t.*,
avg(date_diff(date, lag_date, day)) over(
partition by event_category, event_planner order by date
) rolling_avg
from (
select
t.*
lag(date) over(
partition by event_category, event_planner order by date
) lag_date
from mytable t
) t

对于平均值,您可以使用:

(DATE_DIFF(MIN(SAFE_CAST(date AS date)) OVER (PARTITION BY event_category, event_planner),
SAFE_CAST(date AS date),
day
) / 
NULLIF(COUNT(*) OVER (PARTITION BY event_category, event_planner) - 1, 0)
) AS result

最新更新