统计多周内出现的ID的次数



背景

我有一个events表,为了简洁起见,在这里进行了简化:

event_time timestamp with time zone NOT NULL,
user_id character varying(100) NOT NULL,
... more

事件的一个例子是用户点击网页上的某个内容。许多用户将创建许多事件。样本量仅为约10万名用户,其中一些人会使用该网络应用程序一次,一些人可能经常使用,另一些人可能会突然使用(在两周内多次使用,然后两周内不使用,然后再使用两周(。

一些示例数据:

user_id | event_time
1 | 2022-06-20 00:00:00+00
2 | 2022-06-21 00:01:00+00
1 | 2022-06-24 00:00:00+00
1 | 2022-07-01 00:02:34+00
3 | 2022-07-01 00:03:45+00
1 | 2022-07-18 00:00:00+00
3 | 2022-07-19 01:00:00+00

问题

我该如何编写查询来确定数周内出现的user_id的频率?

示例

理想情况下,该查询将返回一周内至少出现一次user_id计数。

one_occurrence | two_occurrences | three_occurrences | four_occurrences | more_than_four
1 |               1 |                 1 |                0 |              0     

ID1有四个事件,但只有三个事件,因为:

+ 1 - they clicked the page twice within the week of 6/20
+ 1 - they clicked the page once the week of 7/01
+ 1 - and finally, they clicked the page once the week of 7/18
= 3 weeks where user_id `1` clicked at least once.

最后,任何大于4的都将被分组在一起。

其他两个user_id也是如此,但它们更简单。

这只是假设您想要一个基于周日的星期。如果你想要一个星期一,那么只需使用date_trunc()即可,无需调整。

这是一个简单的问题,一旦你计算了一周的开始,就可以进行分组和数据透视,然后计算每个用户这样的组的总数。

with data as (
select distinct user_id, count(*) over (partition by user_id) as week_count
from events
group by user_id, date_trunc('week', event_time::date + 1)::date - 1;
)
select
count(case when week_count = 1 then 1 end) as one_occurrence,
count(case when week_count = 2 then 1 end) as two_occurrence,
count(case when week_count = 3 then 1 end) as three_occurrence,
count(case when week_count = 4 then 1 end) as four_occurrence,
count(case when week_count > 4 then 1 end) as more_than_four
from data

最新更新