根据时间戳对记录进行排序,但忽略时差小于30分钟的记录



我使用Amazon Redshift SQL查询,我尝试根据3列对记录进行排序:时间戳,cookieID, trackingpointID,但是当这3列订购的两个记录之间的时差小于30分钟(具有相同的cookieID和trackingpoint)时,我想省略这些记录并只留下最高的,例如,如果我有:

timestamp   cookie  track
9:04:29     A       10420641
9:04:32     A       10420641
9:04:36     A       10420641
9:04:32     A       10420641
10:30:00    A       10420641
10:31:21    A       10420641
9:07:01     A       10881111
9:07:34     A       10881111
9:07:45     A       10881111
9:04:39     A       4326086

我希望得到这样的结果:

timestamp   cookie  track       row
9:04:36     A       10420641    1
10:31:21    A       10420641    2
9:07:45     A       10881111    1
9:04:39     A       4326086     1

听起来您想要对数据进行会话化。你可以用lag()和累积和来做这个。像这样:

select min(timestamp), cookie, track, sessionid
from (select r.*, sum(IsSessionStart) over (partition by cookie, track order by timestamp) as sessionid
      from (select r.*,
                   (case when datediff(min,
                                       lag(timestamp) over (partition by cookie, track order by timestamp),
                                       timestamp) > 30
                         then 1 else 0 end) as IsSessionStart
            from records r
           ) r
     ) r
group by cookie, track, sessionId;

最新更新