r-获取最近7天内的唯一用户数



我有一个数据集,我想在其中找到过去7天(即过去7天(活跃的人。例如,

date<- c('2009-01-03', '2009-01-03', '2009-01-03', '2009-01-04', '2009-01-05', '2009-02-01')
person<- c('Abe', 'John', 'Abe', 'Kate', 'Jessica', 'Anu')
df<- data.frame(date, person)

我想创建一个名为lastrongeven_days_active的列,它获取过去7天内所有活跃人员的唯一计数。

last_seven_days_active
0
0
0
2
3
0

我试过了。有什么建议吗?

library(zoo)
df$last_seven_days_active <- rollsumr(df$person, k = 8, fill = NA)

具有betweenmap的选项

library(dplyr)
library(purrr)
df %>%
mutate(last_seven_days_active = map_dbl(as.Date(date), 
~ n_distinct(person[between(date, .x - 7, .x) & date != .x] )))
#       date  person last_seven_days_active
#1 2009-01-03     Abe                      0
#2 2009-01-03    John                      0
#3 2009-01-03     Abe                      0
#4 2009-01-04    Kate                      2
#5 2009-01-05 Jessica                      3
#6 2009-02-01     Anu                      0

base解决方案:

df$date <- as.Date(as.character(df$date))
df$last_seven_days_active <- with(df, sapply(date, function(x) length(unique(person[date >= x - 7 & date < x]))))

输出:

date  person last_seven_days_active
1 2009-01-03     Abe                      0
2 2009-01-03    John                      0
3 2009-01-03     Abe                      0
4 2009-01-04    Kate                      2
5 2009-01-05 Jessica                      3
6 2009-02-01     Anu                      0

使用data.table:的选项

library(data.table)
setDT(df)[, date := as.IDate(date, format="%Y-%m-%d")]
df[, days7ago := date - 7L]
df[, last_seven_days_active := 
df[df, on=.(date>=days7ago, date<date), by=.EACHI, 
length(unique(person[!is.na(person)]))]$V1
]

输出:

date  person   days7ago last_seven_days_active
1: 2009-01-03     Abe 2008-12-27                      0
2: 2009-01-03    John 2008-12-27                      0
3: 2009-01-03     Abe 2008-12-27                      0
4: 2009-01-04    Kate 2008-12-28                      2
5: 2009-01-05 Jessica 2008-12-29                      3
6: 2009-02-01     Anu 2009-01-25                      0

最新更新