r-有条件地从长期数据中删除个体

我有一个纵向数据集，如果个人(id(在任何时间点都不满足criteria == 1指示的标准，我想在其中丢弃他们。放在上下文中，我们可以说criteria表示个人是否在期间的任何时间生活在感兴趣的区域。使用一些与我的结构相似的玩具数据：

id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <-  c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) 
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,1)
criteria <- c(1,0,0,0,0,0, 0, 0, 0, 1, 1, 1,0,0,1)

df <- data.frame(cbind(id,time,event, criteria))
> df
id time event criteria
1   1    1     0        1
2   1    2     1        0
3   1    3     0        0
4   2    1     1        0
5   2    2     0        0
6   2    3     0        0
7   3    1     0        0
8   3    2     0        0
9   3    3     0        0
10  4    1     0        1
11  4    2     1        1
12  4    3     0        1
13  5    1     1        0
14  5    2     0        0
15  5    3     1        1

因此，通过删除任何在所有时间点都具有criteria == 0的id(time(，将导致如下的最终结果：

id time event criteria
1   1    1     0        1
2   1    2     1        0
3   1    3     0        0
4   4    1     0        1
5   4    2     1        1
6   4    3     0        1
7   5    1     1        0
8   5    2     0        0
9   5    3     1        1

我一直试图通过使用dplyr::group_by(id)来实现这一点，然后根据标准进行筛选，但这并没有达到我想要的结果。我更喜欢tidyverse解决方案！：D

谢谢！

df %>%
group_by(id) %>%
# looking for the opposite (i.e. !) of criteria == 1 at least 1 time
mutate(is_good = !any(criteria == 1)) %>%
filter(is_good)

如果你愿意研究我推荐的data.table，它会很简单：


library(data.table)
setDT(df) # make it a data.table
df[ , .SD[ !all(criteria==0) ], by=id ]

有关.SD习语的一般介绍和解释，请参阅本页

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

相关内容

最新更新

热门标签：