所以我有如下数据:
USER TIMESTAMP DATA
0001 2021-05-18 20:40:06.251 alpha
0002 2021-05-18 22:40:06.251 beta
0002 2021-05-18 21:40:06.251 gamma
0003 2021-05-18 19:40:06.251 delta
0004 2021-05-18 01:40:06.251 epsilon
0003 2021-05-18 20:40:06.251 zeta
我想创建一个所有DISTINCT用户名的列表,以及FIRST TIMESTAMP的相关数据。
结果应该是这样的:
USER TIMESTAMP DATA
0001 2021-05-18 20:40:06.251 alpha
0002 2021-05-18 21:40:06.251 gamma
0003 2021-05-18 19:40:06.251 delta
0004 2021-05-18 01:40:06.251 epsilon
现在这个问题可以通过以下操作来解决:
SELECT *
FROM df a
WHERE a.TIMESTAMP IN (
SELECT min(TIMESTAMP)
FROM df AS b
WHERE a.USER = b.USER
)
但是,这对我的服务器来说是计算上的负担,所以我需要一种不同的方法。我一直在尝试在USER AND TIMESTAMP上加入df,但我还没有得到它,有什么更快的方法吗?我在Snowflake btw…中做这件事
您可以尝试窗口函数;
select df.*
from (select df.*,
row_number() over (partition by user order by timestamp) as seqnum
from df
) df
where seqnum = 1;