在 SQL/BigQuery 中组合范围



我想在 BigQuery 中组合相邻的日期范围。

我有一张这样的桌子:

ID   START                     END
1    2019-01-18 17:34:58 UTC   2019-02-18 12:14:59 UTC
1    2019-02-18 06:04:39 UTC   2019-02-18 08:05:05 UTC
1    2019-02-18 08:05:05 UTC   2019-02-18 10:06:05 UTC
1    2019-02-18 10:06:05 UTC   2019-02-19 11:16:15 UTC
2    2019-01-19 06:02:29 UTC   2019-01-29 11:02:23 UTC

由于中间的三行代表分成三部分的单个范围,因此我想将它们组合在一起,使表格如下所示:

ID   START                     END
1    2019-01-18 17:34:58 UTC   2019-02-18 12:14:59 UTC
1    2019-02-18 06:04:39 UTC   2019-02-19 11:16:15 UTC
2    2019-01-19 06:02:29 UTC   2019-01-29 11:02:23 UTC

实现这一目标的最佳方法是什么?

您需要确定范围的开始位置。 在您的情况下,它们似乎具有完全匹配的结束和开始,因此您可以使用lag()来识别组的开始位置。启动的累积计数提供了一个分组 ID,可用于聚合:

select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
      from (select t.*, lag(end) over (partition by id order by start) as prev_end
            from t
           ) t
     ) t
group by id, grp;

如果组可以重叠,则累积最大值通常可以解决问题:

select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
      from (select t.*,
                   max(end) over (partition by id order by start rows between unbounded preceding and 1 preceding) as prev_end
            from t
           ) t
     ) t
group by id, grp;

最新更新