在SQL中按条件对连续值进行分组和排名

  • 本文关键字:SQL 条件 连续 sql presto
  • 更新时间 :
  • 英文 :


我有一个表mytable,我想在其中添加两列

我的目标是按user_idmobile_id分组,其中difftime > - 600处存在连续的值序列。序列在created_at(时间戳)中必须是连续的,并给定一个秩,如果是相同的用户和移动id,但出现difftime,则重新开始<-每个单独的组将被分配一个增量值。例如:

> mytable
created_at user_id mobile_id   status difftime
1  2019-01-02 22:01:38 1227604     68409 finished      \N
2  2019-01-03 04:08:29 1227604     68409 finished     -366
3  2019-01-03 15:16:38 1227604     68409  timeout     -668
4  2019-01-04 00:34:40 1227604     68409   failed     -558
5  2019-01-04 00:27:37 1227605     68453   failed      \N
6  2019-01-04 00:35:56 1227605     68453 finished       -8
7  2019-01-04 01:39:52 1227605     68453 finished      -63
8  2019-01-04 02:05:53 1227605     68453  timeout      -26
9  2019-01-04 02:17:17 1227605     68453  timeout      -11
10 2019-01-04 16:51:39 1227605     68453  timeout     -874

将创建的输出

> output
created_at user_id mobile_id   status difftime group rank
1  2019-01-02 22:01:38 1227604     68409 finished      \N    NA   NA
2  2019-01-03 04:08:29 1227604     68409 finished     -366     1    1
3  2019-01-03 15:16:38 1227604     68409  timeout     -668    NA   NA
4  2019-01-04 00:34:40 1227604     68409   failed     -558     2    1
5  2019-01-04 00:27:37 1227605     68453   failed      \N    NA   NA
6  2019-01-04 00:35:56 1227605     68453 finished       -8     3    1
7  2019-01-04 01:39:52 1227605     68453 finished      -63     3    2
8  2019-01-04 02:05:53 1227605     68453  timeout      -26     3    3
9  2019-01-04 02:17:17 1227605     68453  timeout      -11     3    4
10 2019-01-04 16:51:39 1227605     68453  timeout     -874    NA   NA

当我只是尝试分配一个等级时,以下查询会抛出一个错误:WHERE clause cannot contain aggregations, window functions or grouping operations

尽管我使用的是Presto SQL,但这里的任何SQL解决方案都有助于思考如何重构查询

SELECT 
*,
ROW_NUMBER() OVER (PARTITION BY user_id, mobile_id ORDER BY created_at) as rank
from mytable
WHERE DATE_DIFF('minute', created_at, lag(created_at) OVER (PARTITION BY user_id, mobile_id ORDER BY user_id, created_at)) > -600
ORDER BY user_id, mobile_id, created_at

要识别组,请对"无效"的值进行累积求和。然后使用dense_rank()指定一个值。

我不知道你的查询与你的问题有什么关系,但逻辑是这样的:

select t.*, grp,
(case when difftime > -600
then row_number() over (partition by user_id, mobile_id order by created_at)
end) as rank
from (select t.*,
dense_rank() over (partition by user_id, mobile_id order by grouping) as grp
from (select t.*,
sum(case when difftime > -600 then 1 else 0 end) over (partition by user_id, mobile_id order by created_at) as grouping
from t
) t
) t

相关内容

  • 没有找到相关文章

最新更新