选择时间序列的不同连续值



在snowflake/dbt中有一个表,我想在其中的行中选择不同的顺序条目。例如:如果我有

时间戳21-08-01 12:00:0121-08-01 12:00:0221-08-01 12:00:0321-08-01 12:00:0421-08-01 12:00:0621-08-01 12:00:1021-08-01 12:00:10

这就是所谓的缺口和岛屿问题。这通常通过通过两个并发行编号创建组键来解决。

select
user_id, session_id, action, max(timestamp)
from
(
select
user_id, session_id, action, timestamp,
row_number() over (order by timestamp, user_id, session_id, action) -
row_number() over (partition by user_id, session_id, action order by timestamp)
as grp
from mytable
)
group by grp, user_id, session_id, action
order by grp, user_id, session_id, action;

我尝试了一些与gordon有点不同的东西,用"next"的内容构建一个内联视图。记录(使用LEAD函数)。

select user_id, session_id, action, ts
from (
select abc.*, 
lead(user_id) ignore nulls 
over (order by ts, user_id, session_id, action) next_user_id, 
lead(session_id) ignore nulls 
over (order by ts, user_id, session_id, action) next_session_id, 
lead(action) ignore nulls 
over (order by ts, user_id, session_id, action) next_action, 
lead(ts) ignore nulls 
over (order by ts, user_id, session_id, action) next_ts
from   abc 
order by ts, user_id, session_id, action)
where user_id = NVL(next_user_id, user_id)
and   session_id = NVL(next_session_id, session_id)
and   action <> NVL(next_action, 'x')
order by ts, user_id, session_id, action;

这个工作很好,我能够得到你想要的同样的四张唱片。

您可以尝试以下方法,将最近发生的操作分组,并按照它们出现的顺序选择最近发生的操作。

SELECT
user_id,
session_id,
action,
timestamp
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id,session_id,action,gn
ORDER BY timestamp DESC
) as rn
FROM (
SELECT
*,
SUM(continued) OVER (ORDER BY timestamp) as gn
FROM (
SELECT
*,
CASE 
WHEN
LAG(
CONCAT(user_id,session_id,action),
1,
CONCAT(user_id,session_id,action)
) OVER (
ORDER BY timestamp
) = CONCAT(user_id,session_id,action) THEN 0
ELSE 1
END as continued
FROM
my_table
) t2
) t1
) t
WHERE rn=1

让我知道这是否适合你

相关内容

  • 没有找到相关文章

最新更新