统计多周内出现的ID的次数

背景

我有一个events表，为了简洁起见，在这里进行了简化：

event_time timestamp with time zone NOT NULL,
user_id character varying(100) NOT NULL,
... more

事件的一个例子是用户点击网页上的某个内容。许多用户将创建许多事件。样本量仅为约10万名用户，其中一些人会使用该网络应用程序一次，一些人可能经常使用，另一些人可能会突然使用(在两周内多次使用，然后两周内不使用，然后再使用两周(。

一些示例数据：

user_id | event_time
1 | 2022-06-20 00:00:00+00
2 | 2022-06-21 00:01:00+00
1 | 2022-06-24 00:00:00+00
1 | 2022-07-01 00:02:34+00
3 | 2022-07-01 00:03:45+00
1 | 2022-07-18 00:00:00+00
3 | 2022-07-19 01:00:00+00

问题

我该如何编写查询来确定数周内出现的user_id的频率？

示例

理想情况下，该查询将返回一周内至少出现一次的user_id计数。

one_occurrence | two_occurrences | three_occurrences | four_occurrences | more_than_four
1 |               1 |                 1 |                0 |              0

ID1有四个事件，但只有三个事件，因为：

+ 1 - they clicked the page twice within the week of 6/20
+ 1 - they clicked the page once the week of 7/01
+ 1 - and finally, they clicked the page once the week of 7/18
= 3 weeks where user_id `1` clicked at least once.

最后，任何大于4的都将被分组在一起。

其他两个user_id也是如此，但它们更简单。

这只是假设您想要一个基于周日的星期。如果你想要一个星期一，那么只需使用date_trunc()即可，无需调整。

这是一个简单的问题，一旦你计算了一周的开始，就可以进行分组和数据透视，然后计算每个用户这样的组的总数。

with data as (
select distinct user_id, count(*) over (partition by user_id) as week_count
from events
group by user_id, date_trunc('week', event_time::date + 1)::date - 1;
)
select
count(case when week_count = 1 then 1 end) as one_occurrence,
count(case when week_count = 2 then 1 end) as two_occurrence,
count(case when week_count = 3 then 1 end) as three_occurrence,
count(case when week_count = 4 then 1 end) as four_occurrence,
count(case when week_count > 4 then 1 end) as more_than_four
from data

背景

问题

示例

相关内容

最新更新

热门标签：