在snowflake/dbt中有一个表,我想在其中的行中选择不同的顺序条目。例如:如果我有
时间戳21-08-01 12:00:01 21-08-01 12:00:0221-08-01 12:00:0321-08-01 12:00:0421-08-01 12:00:0621-08-01 12:00:1021-08-01 12:00:10
这就是所谓的缺口和岛屿问题。这通常通过通过两个并发行编号创建组键来解决。
select
user_id, session_id, action, max(timestamp)
from
(
select
user_id, session_id, action, timestamp,
row_number() over (order by timestamp, user_id, session_id, action) -
row_number() over (partition by user_id, session_id, action order by timestamp)
as grp
from mytable
)
group by grp, user_id, session_id, action
order by grp, user_id, session_id, action;
我尝试了一些与gordon有点不同的东西,用"next"的内容构建一个内联视图。记录(使用LEAD函数)。
select user_id, session_id, action, ts
from (
select abc.*,
lead(user_id) ignore nulls
over (order by ts, user_id, session_id, action) next_user_id,
lead(session_id) ignore nulls
over (order by ts, user_id, session_id, action) next_session_id,
lead(action) ignore nulls
over (order by ts, user_id, session_id, action) next_action,
lead(ts) ignore nulls
over (order by ts, user_id, session_id, action) next_ts
from abc
order by ts, user_id, session_id, action)
where user_id = NVL(next_user_id, user_id)
and session_id = NVL(next_session_id, session_id)
and action <> NVL(next_action, 'x')
order by ts, user_id, session_id, action;
这个工作很好,我能够得到你想要的同样的四张唱片。
您可以尝试以下方法,将最近发生的操作分组,并按照它们出现的顺序选择最近发生的操作。
SELECT
user_id,
session_id,
action,
timestamp
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY user_id,session_id,action,gn
ORDER BY timestamp DESC
) as rn
FROM (
SELECT
*,
SUM(continued) OVER (ORDER BY timestamp) as gn
FROM (
SELECT
*,
CASE
WHEN
LAG(
CONCAT(user_id,session_id,action),
1,
CONCAT(user_id,session_id,action)
) OVER (
ORDER BY timestamp
) = CONCAT(user_id,session_id,action) THEN 0
ELSE 1
END as continued
FROM
my_table
) t2
) t1
) t
WHERE rn=1
让我知道这是否适合你