我有一个基表在bigquery每分钟的数据一整天的不同日期。我已经创建了其他几个表来计算15分钟和每日总和。示例表如下
user age date time output
abc 24 2023-02-15 11:00:00 3
abc 24 2023-02-15 11:01:00 4
abc 24 2023-02-15 11:02:00 6
abc 24 2023-02-15 11:03:00 3
abc 24 2023-02-15 11:04:00 5
abc 24 2023-02-15 11:05:00 62
abc 24 2023-02-15 11:06:00 5
abc 24 2023-02-15 11:07:00 23
abc 24 2023-02-15 11:08:00 5
abc 24 2023-02-15 11:09:00 3
abc 24 2023-02-15 11:10:00 8
abc 24 2023-02-15 11:11:00 6
abc 24 2023-02-15 11:12:00 3
abc 24 2023-02-15 11:13:00 45
abc 24 2023-02-15 11:14:00 2
abc 24 2023-02-15 11:15:00 4
abc 24 2023-02-15 11:16:00 2
abc 24 2023-02-15 11:17:00 12
abc 24 2023-02-15 11:18:00 3
abc 24 2023-02-15 11:19:00 44
abc 24 2023-02-15 11:20:00 20
abc 24 2023-02-15 11:21:00 23
abc 24 2023-02-15 11:22:00 4
abc 24 2023-02-15 11:23:00 6
abc 24 2023-02-15 11:24:00 28
abc 24 2023-02-15 11:25:00 12
abc 24 2023-02-15 11:26:00 22
abc 24 2023-02-15 11:27:00 8
abc 24 2023-02-15 11:28:00 8
abc 24 2023-02-15 11:29:00 5
user date time_15min 15sum_output
abc 2023-02-15 11:00:00 183
abc 2023-02-15 11:15:00 201
user date dailysum_output
abc 2023-02-15 384
我想连接上面的表来创建如下所示的最终表。第一个表中的所有列,然后根据日期和时间聚合列。你能建议我如何在bigquery SQL
中实现这一点吗?user age date time output 15sum_output dailysum_output
abc 24 2023-02-15 11:00:00 3 183 384
abc 24 2023-02-15 11:01:00 4 183 384
abc 24 2023-02-15 11:02:00 6 183 384
abc 24 2023-02-15 11:03:00 3 183 384
abc 24 2023-02-15 11:04:00 5 183 384
abc 24 2023-02-15 11:05:00 62 183 384
abc 24 2023-02-15 11:06:00 5 183 384
abc 24 2023-02-15 11:07:00 23 183 384
abc 24 2023-02-15 11:08:00 5 183 384
abc 24 2023-02-15 11:09:00 3 183 384
abc 24 2023-02-15 11:10:00 8 183 384
abc 24 2023-02-15 11:11:00 6 183 384
abc 24 2023-02-15 11:12:00 3 183 384
abc 24 2023-02-15 11:13:00 45 183 384
abc 24 2023-02-15 11:14:00 2 183 384
abc 24 2023-02-15 11:15:00 4 201 384
abc 24 2023-02-15 11:16:00 2 201 384
abc 24 2023-02-15 11:17:00 12 201 384
abc 24 2023-02-15 11:18:00 3 201 384
abc 24 2023-02-15 11:19:00 44 201 384
abc 24 2023-02-15 11:20:00 20 201 384
abc 24 2023-02-15 11:21:00 23 201 384
abc 24 2023-02-15 11:22:00 4 201 384
abc 24 2023-02-15 11:23:00 6 201 384
abc 24 2023-02-15 11:24:00 28 201 384
abc 24 2023-02-15 11:25:00 12 201 384
abc 24 2023-02-15 11:26:00 22 201 384
abc 24 2023-02-15 11:27:00 8 201 384
abc 24 2023-02-15 11:28:00 8 201 384
abc 24 2023-02-15 11:29:00 5 201 384
left join with date
您可以像下面这样将聚合表LEFT JOIN
到基表
-- sample data
WITH sample_table AS (
-- put your sample table here
),
-- actual query starts here
`15min_slots` AS (
SELECT user, date,
DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60) time_slot,
TIME_ADD(
'0:0:0',
INTERVAL DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60) * 15 MINUTE
) time_15min,
SUM(output) `15sum_output`
FROM sample_table
GROUP BY 1, 2, 3, 4
),
daily_sum AS (
SELECT user, date, SUM(output) dailysum_output
FROM sample_table
GROUP BY 1, 2
)
SELECT t.*, s.`15sum_output`, ds.dailysum_output
FROM sample_table t
LEFT JOIN `15min_slots` s
ON t.user = s.user AND t.date = s.date
AND DIV(TIME_DIFF(t.time, '0:0:0', SECOND), 15 * 60) = s.time_slot
LEFT JOIN daily_sum ds
ON t.user = ds.user AND t.date = ds.date
;
-- query result
+------+-----+------------+----------+--------+--------------+-----------------+
| user | age | date | time | output | 15sum_output | dailysum_output |
+------+-----+------------+----------+--------+--------------+-----------------+
| abc | 24 | 2023-02-15 | 11:00:00 | 3 | 183 | 384 |
| abc | 24 | 2023-02-15 | 11:01:00 | 4 | 183 | 384 |
| abc | 24 | 2023-02-15 | 11:02:00 | 6 | 183 | 384 |
~~~~~~~
| abc | 24 | 2023-02-15 | 11:13:00 | 45 | 183 | 384 |
| abc | 24 | 2023-02-15 | 11:14:00 | 2 | 183 | 384 |
| abc | 24 | 2023-02-15 | 11:15:00 | 4 | 201 | 384 |
| abc | 24 | 2023-02-15 | 11:16:00 | 2 | 201 | 384 |
~~~~~~~~
| abc | 24 | 2023-02-15 | 11:27:00 | 8 | 201 | 384 |
| abc | 24 | 2023-02-15 | 11:28:00 | 8 | 201 | 384 |
| abc | 24 | 2023-02-15 | 11:29:00 | 5 | 201 | 384 |
+------+-----+------------+----------+--------+--------------+-----------------+
但是,我建议使用下面的窗口函数。它将返回与上面相同的结果。
WITH sample_table AS (
-- put your sample table here
)
SELECT *,
SUM(output) OVER w0 AS `15sum_output`, SUM(output) OVER w1 AS dailysum_output,
SUM(output) OVER w2 AS hourly_output, SUM(output) OVER w3 AS weekly_output
FROM sample_table
WINDOW w0 AS (PARTITION BY user, date, DIV(TIME_DIFF(time, '0:0:0', SECOND), 15 * 60)),
w1 AS (PARTITION BY user, date),
w2 AS (PARTITION BY user, date, EXTRACT(HOUR FROM time)),
w3 AS (PARTITION BY user, EXTRACT(WEEK FROM date));