填写Snowflake中滚动平均值-CTE的空白日期



我有两个表——activitypurchase

活动表:

user_id     date      videos_watched
1     2020-01-02        3
1     2020-01-04        5
1     2020-01-07        5

采购表格:

user_id  purchase_date 
1       2020-01-01 
2       2020-02-02

我想做的是获得自购买以来30天的滚动平均观看视频数量。

基本查询如下:

SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;

但是,Activity表只记录了记录视频的每一天。我想填补空白已经有几天没有观看视频了。

我已经开始研究使用这样的CTE

WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte 
cross join purchases p
left outer join activity a 
on p.user id = a.user_id 
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)

最终目标是拥有这样的东西:

days_since_purchase    videos_watched
1                   3
2                   0 --CTE coalesce inserted value
3                   0
4                   5

在过去的几个小时里,我一直在努力把它做好,但仍然无法真正掌握窍门。

如果你想填补结果集中的空白,那么我认为你应该生成整数而不是日期:

WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a 
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;

您可以使用递归查询生成每次购买后的30天,然后带来活动表:

with cte as (
select 
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases 
union all
select 
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select 
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on  a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase

您的问题不清楚activity表中是否有一列存储每行相关的购买日期。您的查询有fdate列,但没有样本数据。我在查询中使用了该列(如果没有该列,您可能会在不同的购买中计算相同的活动(。

最新更新