我使用的是一个调查数据集(ESS(,其中包括每波中的几个国家和每个波中的一些个人。它看起来像这样:
国家 | Wave |
---|---|
AT | 1 |
AT | 1 |
AT | 1 |
AT | 2 |
AT | 3 |
AT | 3 |
AT | 4 |
AT | 4 |
AT | 5 |
AT | 6 |
AT | 7 |
AT | 8 |
AT | 9 |
AT | 9 |
BE | 1 |
BE | 2 |
BE | 2 |
BE | 3 |
BE | 4 |
BE | 5 |
BE | 6 |
BE | 7 |
BE | 7 |
BE | 9 |
BE | 9 |
您可以选择Country
,其中all
在Wave
中的值为1到9。
library(dplyr)
df1 <- df %>% group_by(Country) %>% filter(all(1:9 %in% Wave)) %>% ungroup
df1
# Country Wave
# <chr> <int>
# 1 AT 1
# 2 AT 1
# 3 AT 1
# 4 AT 2
# 5 AT 3
# 6 AT 3
# 7 AT 4
# 8 AT 4
# 9 AT 5
#10 AT 6
#11 AT 7
#12 AT 8
#13 AT 9
#14 AT 9
这也可以写在基R和data.table
-中
#Base R
df1 <- subset(df, as.logical(ave(Wave, Country,
FUN = function(x) all(1:9 %in% x))))
#data.table
library(data.table)
setDT(df)[, .SD[all(1:9 %in% Wave)], Country]
在r中,通常有几种方法可以解决同一问题。这里有一个轻微的变化
library(dplyr)
df<-data.frame(stringsAsFactors=FALSE,
Country = c("AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT",
"AT", "AT", "AT", "AT", "BE", "BE", "BE", "BE", "BE", "BE",
"BE", "BE", "BE", "BE", "BE"),
Wave = c(1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9, 9, 1, 2, 2, 3, 4, 5, 6,
7, 7, 9, 9)
)
df_new<-df %>%
mutate(Country=Country %>% as.factor(),Wave=Wave %>% as.factor()) %>%
group_by(Country,Wave, .drop=FALSE) %>%
summarize(n=n(), ) %>%
ungroup() %>%
group_by(Country) %>%
summarize(all_waves=if_else(min(n)>0, TRUE, FALSE))
首先按国家和波浪分组,并总结每组的总数。然后,我们按国家分组,并总结最小数字大于0的组。