你如何在R/tidyverse中解决以下问题?:
样本数据:
tibble(
date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
machine_ID = sample(letters[1:10],size = 24,replace = T),
machine_cat = rep(c(1,2),12)
)
目的:
添加一个名为last6m
的列,用于统计在关联的machine_cat
中最近6个月内观察到的唯一machine_ID
的数量。
tidyverse和不循环是首选(purrr是ok的)。
如果有人能快速看一下,我很感激!提前谢谢:-)
根据MrFlick和r2evans建议的帖子得到如下解决方案:G. Grothendieck的回答
library(tidyverse)
library(lubridate)
library(sqldf)
data <- tibble(
date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
machine_ID = sample(letters[1:10],size = 24,replace = T),
machine_cat = rep(c(1,2),12)
)
sqldf("
SELECT a.*, COUNT(distinct(b.machine_ID)) AS last6m
FROM data a
LEFT JOIN data b
ON a.machine_cat = b.machine_cat
AND (b.date between a.date - 180 AND a.date)
GROUP BY a.rowid
") %>% arrange(machine_cat,date)