删除Presto SQL中重复的事件序列

我正在SQL中寻找一种方法(我正在使用Presto(来收集唯一的事件序列。我需要删除相邻的重复事件，始终使用最新的事件。

例如，在这个数据集中，我们看到了重复的B.

事件11111

您可以使用间隙和孤岛方法-找到值的变化位置(使用lag窗口函数(，然后进行累积求和以分配组：

-- sample data
WITH dataset(id, event, time) AS (
VALUES (1, 'A', '10:01 AM'),
(1, 'B', '10:02 AM'),
(1, 'B', '10:03 AM'),
(1, 'A', '10:04 AM'),
(1, 'B', '10:05 AM')
)
-- query
select id, 
max(event) event, 
max(time) time
from (
select id, 
event, 
time, 
sum(if(event != prev_event, 1, 0)) over(partition by id order by time) grp
from (
select *, 
lag(event) over(partition by id order by time) as prev_event
from dataset)
)
group by id, grp;

输出：

时间

id	事件
1	A	上午10:01
1	B	上午10:03
1	A	上午10:04
1	B	上午10:05

相关内容

最新更新

热门标签：