当Clickhouse中的列随时间变化时,按分组和聚合



假设我在Clickhouse中有下表:

f_datetime, f_user, f_tile
2021-07-08 07:00:00, x, a
2021-07-08 08:30:00, x, a
2021-07-08 08:45:00, x, a
2021-07-08 09:00:00, x, b
2021-07-08 11:00:00, x, b
2021-07-08 12:00:00, x, a
2021-07-08 15:00:00, x, a
2021-07-08 16:00:00, x, b
2021-07-08 20:00:00, x, b

我想要一个聚合查询来获得以下结果:

f_user, f_tile, f_duration
x, a, 105
x, b, 120
x, a, 180
x, b, 240

我想分组并计算持续时间,直到f_tile,f_user在有序表中按f_datetime更改为止。

有什么解决办法吗?

这是一个缺口和孤岛问题的例子。对于这个版本,最简单的解决方案可能是行号的差异:

select f_user, t_tile,
min(f_datetime), max(f_datetime),
date_diff('minute', min(f_datetime), max(f_datetime)) as f_duration
from (select t.*,
row_number() over (partition by f_user order by f_datetime) as seqnum,
row_number() over (partition by f_user, f_tile order by f_datetime) as seqnum_2
from t
) t
group by f_user, f_tile, (seqnum - seqnum_2)

表中的下一个日期时间值是该用户的最低值,即>=当前行的日期时间。

SELECT t.f_user, t.f_title,
(SELECT MIN(t1.f_datetime) 
FROM Yourtable t1 
WHERE t1.f_datetime >= t.f_datetime AND t1.f_user = t.f_user AND t1.f_tile = t.f_tile) - t.f_datetime
FROM Yourtable t

您可以应用DIFF函数,而不是减去这些值。

如果你想在每个连续的f_tile段上进行计算(我猜是每个f_user),这里有一种方法,使用窗口函数:

  • 数据:初始表
  • cte2:查找每个f_user运行的每个连续f_tile的边
  • cte3:为聚合的每次运行计算一个组(grp)指示符
  • cte4:计算每次f_tile运行的持续时间
WITH cte2 AS (  -- Find edges of each f_tile run for each f_user by datetime
SELECT t.*
, CASE WHEN LAG(f_tile) OVER (PARTITION BY f_user ORDER BY f_datetime) = f_tile THEN 0 ELSE 1 END AS edge
FROM data AS t
)
, cte3 AS (  -- Assign a group (grp) indicator for each run for aggregation
SELECT t.*, SUM(edge) OVER (PARTITION BY f_user ORDER BY f_datetime) AS grp
FROM cte2 AS t
)
, cte4 AS (
SELECT f_user, f_tile, grp
, MIN(f_datetime) AS start
, DATE_DIFF('minute', MAX(f_datetime), MIN(f_datetime)) AS duration
FROM cte3 AS t
GROUP BY f_user, f_tile, grp
)
SELECT f_user, f_tile, duration
FROM cte4
ORDER BY start
;

结果:

+--------+--------+----------+
| f_user | f_tile | duration |
+--------+--------+----------+
| x      | a      |      105 |
| x      | b      |      120 |
| x      | a      |      180 |
| x      | b      |      240 |
+--------+--------+----------+

注意:我没有要测试的clickhouse实例。根据需要进行调整。我已经用另一台发动机测试了类似的发动机。

最新更新