r-通过不同的变量同时创建样本的子集



我有一个数据帧,如下所示。变量a和b是连续的,变量v1-v7是二进制的。

> df <- data.frame(a= c(1,1,2,3,5),
+                      b  = c(3, 6,8, 2, 4),
+                      v1 = c(0,0,0,0,0),
+                      v2 = c(1,0,0,0,0),
+                      v3 = c(0,1,1,1,1),
+                      v4 = c(0,1,1,1,1),
+                      v5 = c(0,0,0,0,1),
+                      v6 = c(0,0,0,0,0),
+                      v7 = c(0,0,0,0,0))
> df
a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0
> 

我想根据上面显示的数据帧创建七个子样本。具体来说,我想做七个子样本,只包括变量a和b,并且当每个v1-v7等于1时。例如,

> df1 <- df %>% filter(v1==1)
> df1
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
> df2 <- df %>% filter(v2==1)
> df2
a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
> df3 <- df %>% filter(v3==1)
> df3
a b v1 v2 v3 v4 v5 v6 v7
1 1 6  0  0  1  1  0  0  0
2 2 8  0  0  1  1  0  0  0
3 3 2  0  0  1  1  0  0  0
4 5 4  0  0  1  1  1  0  0

我想知道如何在R中同时执行这些操作?谢谢

以下是使用lapply()的方法。你最好把你的成绩列在一个清单里。v1的子样本是subsamples[[1]],依此类推。-

subsamples <- lapply(3:9, function(x) df[df[[x]]==1, ])
subsamples
[[1]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[2]]
a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
[[3]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0
[[4]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0
[[5]]
a b v1 v2 v3 v4 v5 v6 v7
5 5 4  0  0  1  1  1  0  0
[[6]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[7]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)

在dplyr中,您可以将变量名指定为带有代词.data的字符串(请参阅数据屏蔽(

df_samples <- list()
for(i in 1:7)
df_samples[[i]] <- filter(df, .data[[paste0("v", i)]] == 1)

只需在列"v1"到"v7"上循环,执行filter并在list中返回

library(dplyr)
library(stringr)
library(purrr)
lst1 <- str_subset(names(df), "^v\d+") %>%
map(~ df %>% 
filter(if_all(all_of(.x), ~ .x == 1)))
names(lst1) <- str_c('df', seq_along(lst1))

最好把它放在list里。如果我们需要在全局环境中创建对象(不推荐(,请在命名的list上使用list2env

list2env(lst1, .GlobalEnv)

最新更新