基于在另一个表中配置的时间窗口期间在事件表上聚合



我有三个表,UpEvent、DownEvent和AnalysisWindow

UpEvent:
up_event_id | event_date            |  EventMetric
1              2015-01-01T06:00:00       54
2              2015-01-01T07:30:00       76
DownEvent:
down_event_id | event_date          |  EventMetric
1              2015-01-01T06:46:00         22
2              2015-01-01T07:33:00         34
AnalysisWindow:
window_id |        win_start           |    win_end
1              2015-01-01T00:00:00       2015-01-01T04:00:00
2              2015-01-01T00:00:00       2015-01-01T08:00:00
.
.

我想在每个AnalysisWindow中进行分析,以便聚合在定义的窗口之间发生的UpEvent和DownEvent。

因此,对于每个AnalysisWindow记录,我最终会得到一个特征行:

WinStart             |  WinEnd               |   TotalUpEvents  |  TotalDownEvents
2015-01-01T00:00:00    2015-01-01T04:00:00         0                  0
2015-01-01T00:00:00    2015-01-01T08:00:00         2                  2

我的第一个想法是做一些类似的事情

select win.win_start, 
win.win_end, 
count(ue.*), 
sum(ue.EventMetric) 
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)

这显然不起作用。

我处理这个问题的方法不对吗?我想对我配置的各个窗口的表进行窗口化分析,每个窗口获得1个聚合记录

下面是针对BigQuery标准SQL的(并且实际有效!(

#standardSQL
WITH ue_win AS ( 
SELECT 
window_id, COUNT(1) TotalUpEvents  
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.UpEvent` ue
WHERE ue.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
), de_win AS ( 
SELECT 
window_id, COUNT(1) TotalDownEvents  
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.DownEvent` de
WHERE de.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
)
SELECT 
window_id, win_start, win_end, 
IFNULL(TotalUpEvents, 0) TotalUpEvents, 
IFNULL(TotalDownEvents, 0) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
LEFT JOIN ue_win USING(window_id)
LEFT JOIN de_win USING(window_id)

一种方法使用相关的子查询:

select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;

以上内容有效,至少当公式化为:时

with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;

另一种选择是使用union all:

ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)

最新更新