我使用Amazon Redshift SQL查询,我尝试根据3列对记录进行排序:时间戳,cookieID, trackingpointID,但是当这3列订购的两个记录之间的时差小于30分钟(具有相同的cookieID和trackingpoint)时,我想省略这些记录并只留下最高的,例如,如果我有:
timestamp cookie track
9:04:29 A 10420641
9:04:32 A 10420641
9:04:36 A 10420641
9:04:32 A 10420641
10:30:00 A 10420641
10:31:21 A 10420641
9:07:01 A 10881111
9:07:34 A 10881111
9:07:45 A 10881111
9:04:39 A 4326086
我希望得到这样的结果:
timestamp cookie track row
9:04:36 A 10420641 1
10:31:21 A 10420641 2
9:07:45 A 10881111 1
9:04:39 A 4326086 1
听起来您想要对数据进行会话化。你可以用lag()
和累积和来做这个。像这样:
select min(timestamp), cookie, track, sessionid
from (select r.*, sum(IsSessionStart) over (partition by cookie, track order by timestamp) as sessionid
from (select r.*,
(case when datediff(min,
lag(timestamp) over (partition by cookie, track order by timestamp),
timestamp) > 30
then 1 else 0 end) as IsSessionStart
from records r
) r
) r
group by cookie, track, sessionId;