我有一个数据帧,如下所示。变量a和b是连续的,变量v1-v7是二进制的。
> df <- data.frame(a= c(1,1,2,3,5),
+ b = c(3, 6,8, 2, 4),
+ v1 = c(0,0,0,0,0),
+ v2 = c(1,0,0,0,0),
+ v3 = c(0,1,1,1,1),
+ v4 = c(0,1,1,1,1),
+ v5 = c(0,0,0,0,1),
+ v6 = c(0,0,0,0,0),
+ v7 = c(0,0,0,0,0))
> df
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
>
我想根据上面显示的数据帧创建七个子样本。具体来说,我想做七个子样本,只包括变量a和b,并且当每个v1-v7等于1时。例如,
> df1 <- df %>% filter(v1==1)
> df1
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
> df2 <- df %>% filter(v2==1)
> df2
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
> df3 <- df %>% filter(v3==1)
> df3
a b v1 v2 v3 v4 v5 v6 v7
1 1 6 0 0 1 1 0 0 0
2 2 8 0 0 1 1 0 0 0
3 3 2 0 0 1 1 0 0 0
4 5 4 0 0 1 1 1 0 0
我想知道如何在R中同时执行这些操作?谢谢
以下是使用lapply()
的方法。你最好把你的成绩列在一个清单里。v1
的子样本是subsamples[[1]]
,依此类推。-
subsamples <- lapply(3:9, function(x) df[df[[x]]==1, ])
subsamples
[[1]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[2]]
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
[[3]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
[[4]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
[[5]]
a b v1 v2 v3 v4 v5 v6 v7
5 5 4 0 0 1 1 1 0 0
[[6]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[7]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
在dplyr中,您可以将变量名指定为带有代词.data
的字符串(请参阅数据屏蔽(
df_samples <- list()
for(i in 1:7)
df_samples[[i]] <- filter(df, .data[[paste0("v", i)]] == 1)
只需在列"v1"到"v7"上循环,执行filter
并在list
中返回
library(dplyr)
library(stringr)
library(purrr)
lst1 <- str_subset(names(df), "^v\d+") %>%
map(~ df %>%
filter(if_all(all_of(.x), ~ .x == 1)))
names(lst1) <- str_c('df', seq_along(lst1))
最好把它放在list
里。如果我们需要在全局环境中创建对象(不推荐(,请在命名的list
上使用list2env
list2env(lst1, .GlobalEnv)