使用分区的最早日期来确定属于该分区的其他日期

>假设这是我的表：

ID  DATE
--------------
1   2018-11-12 
2   2018-11-13
3   2018-11-14
4   2018-11-15
5   2018-11-16
6   2019-03-05
7   2019-05-07
8   2019-05-08
9   2019-05-08

我需要让分区由分区中的第一个日期确定。其中，第一个日期后 2 天内的任何日期都属于同一分区。

如果对每个分区进行排名，则表最终将如下所示

PARTITION ID  DATE
------------------------
1         1   2018-11-12 
1         2   2018-11-13
1         3   2018-11-14
2         4   2018-11-15
2         5   2018-11-16
3         6   2019-03-05
4         7   2019-05-07
4         8   2019-05-08
4         9   2019-05-08

我尝试使用带有滞后的datediff与上一个日期进行比较，但这会允许根据间距对分区进行不适当的调整，例如所有这些日期都将包含在同一个分区中：

ID  DATE
--------------
1   2018-11-12 
2   2018-11-14
3   2018-11-16
4   2018-11-18
3   2018-11-20
4   2018-11-22

以前的有缺陷的尝试：

当日期比前一个日期晚 2 天以上时标记：

(case when datediff(day, lag(event_time, 1) over (partition by user_id, stage order by event_time), event_time) > 2 then 1 else 0 end)

为此需要使用

递归 CTE，因此操作成本很高。

with t as (
      -- add an incrementing column with no gaps
      select t.*, row_number() over (order by date) as seqnum
      from t
     ),
     cte as (
      select id, date, date as mindate, seqnum
      from t
      where seqnum = 1
      union all
      select t.id, t.date,
             (case when t.date <= dateadd(day, 2, cte.mindate)
                   then cte.mindate else t.date
              end) as mindate,
             t.seqnum
      from cte join
           t
           on t.seqnum = cte.seqnum + 1
    )
select cte.*, dense_rank() over (partition by mindate) as partition_num
from cte;

相关内容

最新更新

热门标签：