假设我在Clickhouse中有下表:
f_datetime, f_user, f_tile
2021-07-08 07:00:00, x, a
2021-07-08 08:30:00, x, a
2021-07-08 08:45:00, x, a
2021-07-08 09:00:00, x, b
2021-07-08 11:00:00, x, b
2021-07-08 12:00:00, x, a
2021-07-08 15:00:00, x, a
2021-07-08 16:00:00, x, b
2021-07-08 20:00:00, x, b
我想要一个聚合查询来获得以下结果:
f_user, f_tile, f_duration
x, a, 105
x, b, 120
x, a, 180
x, b, 240
我想分组并计算持续时间,直到f_tile,f_user在有序表中按f_datetime更改为止。
有什么解决办法吗?
这是一个缺口和孤岛问题的例子。对于这个版本,最简单的解决方案可能是行号的差异:
select f_user, t_tile,
min(f_datetime), max(f_datetime),
date_diff('minute', min(f_datetime), max(f_datetime)) as f_duration
from (select t.*,
row_number() over (partition by f_user order by f_datetime) as seqnum,
row_number() over (partition by f_user, f_tile order by f_datetime) as seqnum_2
from t
) t
group by f_user, f_tile, (seqnum - seqnum_2)
表中的下一个日期时间值是该用户的最低值,即>=当前行的日期时间。
SELECT t.f_user, t.f_title,
(SELECT MIN(t1.f_datetime)
FROM Yourtable t1
WHERE t1.f_datetime >= t.f_datetime AND t1.f_user = t.f_user AND t1.f_tile = t.f_tile) - t.f_datetime
FROM Yourtable t
您可以应用DIFF函数,而不是减去这些值。
如果你想在每个连续的f_tile
段上进行计算(我猜是每个f_user
),这里有一种方法,使用窗口函数:
- 数据:初始表
- cte2:查找每个
f_user
运行的每个连续f_tile
的边 - cte3:为聚合的每次运行计算一个组(
grp
)指示符 - cte4:计算每次
f_tile
运行的持续时间
WITH cte2 AS ( -- Find edges of each f_tile run for each f_user by datetime
SELECT t.*
, CASE WHEN LAG(f_tile) OVER (PARTITION BY f_user ORDER BY f_datetime) = f_tile THEN 0 ELSE 1 END AS edge
FROM data AS t
)
, cte3 AS ( -- Assign a group (grp) indicator for each run for aggregation
SELECT t.*, SUM(edge) OVER (PARTITION BY f_user ORDER BY f_datetime) AS grp
FROM cte2 AS t
)
, cte4 AS (
SELECT f_user, f_tile, grp
, MIN(f_datetime) AS start
, DATE_DIFF('minute', MAX(f_datetime), MIN(f_datetime)) AS duration
FROM cte3 AS t
GROUP BY f_user, f_tile, grp
)
SELECT f_user, f_tile, duration
FROM cte4
ORDER BY start
;
结果:
+--------+--------+----------+
| f_user | f_tile | duration |
+--------+--------+----------+
| x | a | 105 |
| x | b | 120 |
| x | a | 180 |
| x | b | 240 |
+--------+--------+----------+
注意:我没有要测试的clickhouse实例。根据需要进行调整。我已经用另一台发动机测试了类似的发动机。