postgreSQL:如何搜索连续事件



在PostGRES中,如何获取连续三周至少发生1次"事件"的所有用户?

my_events

id  user_id  created_at
------------------------------------------
1   1234     2015-05-05 22:49:18.345787
2   4567     2015-05-05 22:49:30.343386
3   1234     2015-05-05 22:50:16.465029
...

我尝试过的:

步骤1(获取用户发生事件的所有周数:

SELECT
user_id,
date_part('year', created_at) AS year,
date_part('week', created_at) AS week,
COUNT(1) AS events
FROM my_events
GROUP BY user_id, date_part('year', created_at), date_part('week', created_at)

步骤2(统计用户发生事件的所有周数:

SELECT
user_id,
COUNT(1) AS weeks_with_events
FROM (
  -- Get all weeks where a user had an event.
  SELECT
  user_id,
  date_part('year', created_at) AS year,
  date_part('week', created_at) AS week,
  COUNT(1) AS events
  FROM my_events
  GROUP BY user_id, date_part('year', created_at), date_part('week', created_at)
) a
GROUP BY user_id

不幸的是,这只会给我一个事件的周数,它不会检查它们是否是连续的周。

with x as 
(SELECT
 user_id,
 date_part('year', created_at) AS year,
 date_part('week', created_at) AS week
 FROM my_events)
, y as (select user_id, year, week as curr, 
   coalesce(lead(week) over(partition by user_id,year order by week),0) as nxt,
  coalesce(lag(week) over(partition by user_id,year order by week),0) as prev 
  from x)
  select distinct user_id 
  from y
  where nxt-curr = 1 and curr-prev = 1

您可以尝试这样做,因为您需要用户在连续3周内至少有一个事件。前提是,如果用户在连续3周内至少有一个事件,那么他将至少有一行nxt-currcurr-prev为1。

我假设您正在查看这个查询:

SELECT user_id, date_part('year', created_at) AS yyyy,
       date_part('week', created_at) AS ww, COUNT(8) AS events
FROM my_events
GROUP BY user_id, date_part('year', created_at), date_part('week', created_at);

虽然不是最通用的解决方案,但最简单的可能是使用lead()/lag():

with t as (
      SELECT user_id, date_part('year', created_at) AS yy,
             date_part('week', created_at) AS ww, COUNT(8) AS events
      FROM my_events
      GROUP BY user_id, date_part('year', created_at), date_part('week', created_at)
     )
select user_id, ww
from (select t.*,
             lead(ww, 1) over (partition by user_id order by year, ww) as ww1,
             lead(ww, 2) over (partition by user_id order by year, ww) as ww2
      from t
     ) t
where ww1 = ww + 1 and ww2 = ww + 2;

这将在年底出现问题,因为一年中的一周的定义将重新开始。

编辑:

为了应付年底,你想做一些类似yyyy*52 + ww的事情。唉,好几个星期都不行。

相反,选择一个原始的开始日期,该日期在一周中的正确日期。然后计算自该日期起的天数,除以7并截断。这会给你一个绝对的周数,与年份无关。从这一点来看,查询的其余部分基本相同。

最新更新