r语言 - 6个月内唯一观测值的计数(每个观测值)



你如何在R/tidyverse中解决以下问题?:

样本数据:

tibble(
date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
machine_ID = sample(letters[1:10],size = 24,replace = T),
machine_cat = rep(c(1,2),12)
)

目的:

添加一个名为last6m的列,用于统计在关联的machine_cat中最近6个月内观察到的唯一machine_ID的数量。

tidyverse和不循环是首选(purrr是ok的)。

如果有人能快速看一下,我很感激!提前谢谢:-)

根据MrFlick和r2evans建议的帖子得到如下解决方案:G. Grothendieck的回答

library(tidyverse)
library(lubridate)
library(sqldf)
data <- tibble(
date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
machine_ID = sample(letters[1:10],size = 24,replace = T),
machine_cat = rep(c(1,2),12)
)
sqldf("
SELECT a.*, COUNT(distinct(b.machine_ID)) AS last6m
FROM data a
LEFT JOIN data b
ON a.machine_cat = b.machine_cat
AND (b.date between a.date - 180 AND a.date)
GROUP BY a.rowid
") %>% arrange(machine_cat,date)

相关内容

  • 没有找到相关文章

最新更新