我有下表,我需要找到同一个人具有不同状态的连续两天(天之间没有间隔(。
name date status
-------------------------------
John 2020-06-01 submitted
John 2020-06-02 pending
John 2020-06-03 approved
John 2020-06-04 approved
Amy 2020-06-02 pending
Amy 2020-06-03 pending
Amy 2020-06-04 pending
Dan 2020-06-02 submitted
Dan 2020-06-03 approved
Dan 2020-06-04 approved
Mary 2020-06-03 submitted
Mary 2020-06-04 pending
输出应如下所示:
name date status
-------------------------------
John 2020-06-01 submitted
John 2020-06-02 pending
John 2020-06-03 approved
Dan 2020-06-02 submitted
Dan 2020-06-03 approved
Mary 2020-06-03 submitted
Mary 2020-06-04 pending
目前,我导出了该表并编写了python代码来执行此操作。但是,我想知道是否可以仅使用SQL来实现这一点?(我研究了SQL:仅检索值已更改的记录,但无法弄清楚如何在我的情况下使其工作,因为status
字段是字符串而不是数字(谢谢!
我会根据名称、连续天数和不同的状态自行加入表格:
SELECT a.name, a.date, a.status
FROM mytable a
JOIN mytable b ON a.name = b.name AND
a.date + INTERVAL '1' DAY = b.date AND
a.status <> b.status
使用LAG()
和LEAD()
窗口函数:
select t.name, t.date, t.status
from (
select *,
coalesce(lag(status) over (partition by name order by date), status) prev_status,
coalesce(lead(status) over (partition by name order by date), status) next_status
from tablename
) t
where t.status <> t.prev_status or t.status <> t.next_status
order by t.name, t.date
请参阅演示。
结果:
| name | date | status |
| ---- | ---------- | --------- |
| Dan | 2020-06-02 | submitted |
| Dan | 2020-06-03 | approved |
| John | 2020-06-01 | submitted |
| John | 2020-06-02 | pending |
| John | 2020-06-03 | approved |
| Mary | 2020-06-03 | submitted |
| Mary | 2020-06-04 | pending |