R范畴内的子集条件



我使用的是一个调查数据集(ESS(,其中包括每波中的几个国家和每个波中的一些个人。它看起来像这样:

国家 Wave
AT 1
AT 1
AT 1
AT 2
AT 3
AT 3
AT 4
AT 4
AT 5
AT 6
AT 7
AT 8
AT 9
AT 9
BE 1
BE 2
BE 2
BE 3
BE 4
BE 5
BE 6
BE 7
BE 7
BE 9
BE 9

您可以选择Country,其中allWave中的值为1到9。

library(dplyr)
df1 <- df %>% group_by(Country) %>% filter(all(1:9 %in% Wave)) %>% ungroup
df1
#   Country  Wave
#   <chr>   <int>
# 1 AT          1
# 2 AT          1
# 3 AT          1
# 4 AT          2
# 5 AT          3
# 6 AT          3
# 7 AT          4
# 8 AT          4
# 9 AT          5
#10 AT          6
#11 AT          7
#12 AT          8
#13 AT          9
#14 AT          9

这也可以写在基R和data.table-中

#Base R
df1 <- subset(df, as.logical(ave(Wave, Country, 
FUN = function(x) all(1:9 %in% x))))
#data.table
library(data.table)
setDT(df)[, .SD[all(1:9 %in% Wave)], Country]

在r中,通常有几种方法可以解决同一问题。这里有一个轻微的变化

library(dplyr)
df<-data.frame(stringsAsFactors=FALSE,
Country = c("AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT",
"AT", "AT", "AT", "AT", "BE", "BE", "BE", "BE", "BE", "BE",
"BE", "BE", "BE", "BE", "BE"),
Wave = c(1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9, 9, 1, 2, 2, 3, 4, 5, 6,
7, 7, 9, 9)
)
df_new<-df %>%
mutate(Country=Country %>% as.factor(),Wave=Wave %>% as.factor()) %>%
group_by(Country,Wave, .drop=FALSE) %>%
summarize(n=n(), ) %>%
ungroup() %>%
group_by(Country) %>%
summarize(all_waves=if_else(min(n)>0, TRUE, FALSE))

首先按国家和波浪分组,并总结每组的总数。然后,我们按国家分组,并总结最小数字大于0的组。

最新更新