连接Bigquery中的区间聚合数据



我有一个基表在bigquery每分钟的数据一整天的不同日期。我已经创建了其他几个表来计算15分钟和每日总和。示例表如下

user    age date            time            output
abc 24  2023-02-15  11:00:00    3
abc 24  2023-02-15  11:01:00    4
abc 24  2023-02-15  11:02:00    6
abc 24  2023-02-15  11:03:00    3
abc 24  2023-02-15  11:04:00    5
abc 24  2023-02-15  11:05:00    62
abc 24  2023-02-15  11:06:00    5
abc 24  2023-02-15  11:07:00    23
abc 24  2023-02-15  11:08:00    5
abc 24  2023-02-15  11:09:00    3
abc 24  2023-02-15  11:10:00    8
abc 24  2023-02-15  11:11:00    6
abc 24  2023-02-15  11:12:00    3
abc 24  2023-02-15  11:13:00    45
abc 24  2023-02-15  11:14:00    2
abc 24  2023-02-15  11:15:00    4
abc 24  2023-02-15  11:16:00    2
abc 24  2023-02-15  11:17:00    12
abc 24  2023-02-15  11:18:00    3
abc 24  2023-02-15  11:19:00    44
abc 24  2023-02-15  11:20:00    20
abc 24  2023-02-15  11:21:00    23
abc 24  2023-02-15  11:22:00    4
abc 24  2023-02-15  11:23:00    6
abc 24  2023-02-15  11:24:00    28
abc 24  2023-02-15  11:25:00    12
abc 24  2023-02-15  11:26:00    22
abc 24  2023-02-15  11:27:00    8
abc 24  2023-02-15  11:28:00    8
abc 24  2023-02-15  11:29:00    5
user    date            time_15min  15sum_output
abc 2023-02-15  11:00:00    183
abc 2023-02-15  11:15:00    201
user    date            dailysum_output 
abc 2023-02-15  384 

我想连接上面的表来创建如下所示的最终表。第一个表中的所有列,然后根据日期和时间聚合列。你能建议我如何在bigquery SQL

中实现这一点吗?
user    age date            time           output   15sum_output    dailysum_output
abc 24  2023-02-15  11:00:00    3   183         384
abc 24  2023-02-15  11:01:00    4   183         384
abc 24  2023-02-15  11:02:00    6   183         384
abc 24  2023-02-15  11:03:00    3   183         384
abc 24  2023-02-15  11:04:00    5   183         384
abc 24  2023-02-15  11:05:00    62  183         384
abc 24  2023-02-15  11:06:00    5   183         384
abc 24  2023-02-15  11:07:00    23  183         384
abc 24  2023-02-15  11:08:00    5   183         384
abc 24  2023-02-15  11:09:00    3   183         384
abc 24  2023-02-15  11:10:00    8   183         384
abc 24  2023-02-15  11:11:00    6   183         384
abc 24  2023-02-15  11:12:00    3   183         384
abc 24  2023-02-15  11:13:00    45  183         384
abc 24  2023-02-15  11:14:00    2   183         384
abc 24  2023-02-15  11:15:00    4   201         384
abc 24  2023-02-15  11:16:00    2   201         384
abc 24  2023-02-15  11:17:00    12  201         384
abc 24  2023-02-15  11:18:00    3   201         384
abc 24  2023-02-15  11:19:00    44  201         384
abc 24  2023-02-15  11:20:00    20  201         384
abc 24  2023-02-15  11:21:00    23  201         384
abc 24  2023-02-15  11:22:00    4   201         384
abc 24  2023-02-15  11:23:00    6   201         384
abc 24  2023-02-15  11:24:00    28  201         384
abc 24  2023-02-15  11:25:00    12  201         384
abc 24  2023-02-15  11:26:00    22  201         384
abc 24  2023-02-15  11:27:00    8   201         384
abc 24  2023-02-15  11:28:00    8   201         384
abc 24  2023-02-15  11:29:00    5   201         384

left join with date

您可以像下面这样将聚合表LEFT JOIN到基表

-- sample data
WITH sample_table AS (
-- put your sample table here
),
-- actual query starts here
`15min_slots` AS (
SELECT user, date,
DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60) time_slot,
TIME_ADD(
'0:0:0',
INTERVAL DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60) * 15 MINUTE
) time_15min,
SUM(output) `15sum_output`
FROM sample_table
GROUP BY 1, 2, 3, 4
),
daily_sum AS (
SELECT user, date, SUM(output) dailysum_output
FROM sample_table
GROUP BY 1, 2
)
SELECT t.*, s.`15sum_output`, ds.dailysum_output
FROM sample_table t
LEFT JOIN `15min_slots` s
ON t.user = s.user AND t.date = s.date
AND DIV(TIME_DIFF(t.time, '0:0:0', SECOND), 15 * 60) = s.time_slot
LEFT JOIN daily_sum ds 
ON t.user = ds.user AND t.date = ds.date
;
-- query result
+------+-----+------------+----------+--------+--------------+-----------------+
| user | age |    date    |   time   | output | 15sum_output | dailysum_output |
+------+-----+------------+----------+--------+--------------+-----------------+
| abc  |  24 | 2023-02-15 | 11:00:00 |      3 |          183 |             384 |
| abc  |  24 | 2023-02-15 | 11:01:00 |      4 |          183 |             384 |
| abc  |  24 | 2023-02-15 | 11:02:00 |      6 |          183 |             384 |
~~~~~~~
| abc  |  24 | 2023-02-15 | 11:13:00 |     45 |          183 |             384 |
| abc  |  24 | 2023-02-15 | 11:14:00 |      2 |          183 |             384 |
| abc  |  24 | 2023-02-15 | 11:15:00 |      4 |          201 |             384 |
| abc  |  24 | 2023-02-15 | 11:16:00 |      2 |          201 |             384 |
~~~~~~~~
| abc  |  24 | 2023-02-15 | 11:27:00 |      8 |          201 |             384 |
| abc  |  24 | 2023-02-15 | 11:28:00 |      8 |          201 |             384 |
| abc  |  24 | 2023-02-15 | 11:29:00 |      5 |          201 |             384 |
+------+-----+------------+----------+--------+--------------+-----------------+

但是,我建议使用下面的窗口函数。它将返回与上面相同的结果。

WITH sample_table AS (
-- put your sample table here
)
SELECT *,
SUM(output) OVER w0 AS `15sum_output`, SUM(output) OVER w1 AS dailysum_output,
SUM(output) OVER w2 AS hourly_output, SUM(output) OVER w3 AS weekly_output
FROM sample_table
WINDOW w0 AS (PARTITION BY user, date, DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60)),
w1 AS (PARTITION BY user, date),
w2 AS (PARTITION BY user, date, EXTRACT(HOUR FROM time)),
w3 AS (PARTITION BY user, EXTRACT(WEEK FROM date));

相关内容

  • 没有找到相关文章

最新更新