为PostgreSQL中的所有值选择最新连续组的开头



我有下表:

周开始 bcid
2022-07-18 df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-07-25 e46ac8ed-bcc5-4dab-9694-479637e1ee99
2022-08-01 e46ac8ed-bcc5-4dab-9694-479637e1ee99
2022-08-01 df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-08-08 df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-08-08 ca3fd5c6-73d0-4104-aa03-dcc0eb263c0a
2022-08-08 e46ac8ed-bcc5-4dab-9694-479637e1ee99

首先,使用lag()标记所有组以检查是否有中断。

接下来,将组号赋予分组(将boolean强制转换为int将为true赋予1,为false赋予0(。

使用distinct on抓取要保存的记录:

with mark_gaps as (
select *,
"weekStart" > lag("weekStart") over w + interval '7 days' as new_group
from tx
window w as (partition by bcid order by "weekStart")
), group_nums as (
select *, 
sum(coalesce(new_group, true)::int) over w as group_num
from mark_gaps
window w as (partition by bcid order by "weekStart")
)
select distinct on (bcid) bcid, "weekStart"
from group_nums
order by bcid, group_num desc, "weekStart";

此处更新小提琴

最新更新