我有以下三个数据帧:
df1<- structure(list(plot = c(1L, 1L, 1L, 2L, 2L, 2L), lepsp = structure(c(1L,
2L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD",
"lepE"), class = "factor"), count = c(1L, 2L, 3L, 4L, 1L, 3L)), class = "data.frame",
row.names = c(NA, -6L))
df2<-structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), plantsp = structure(c(12L, 13L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L), .Label = c("H",
"I", "J", "K", "L", "M", "O", "P", "Q", "S", "U", "X", "Y", "Z"
), class = "factor"), leafArea = c(1L, 5L, 5L, 10L, 20L, 11L,
12L, 8L, 1L, 5L, 10L, 15L, 20L, 12L, 13L, 2L)), class = "data.frame", row.names = c(NA,
-16L))
df3<-structure(list(lepsp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD",
"lepE"), class = "factor"), plantsp = structure(c(6L, 7L, 8L,
6L, 5L, 2L, 3L, 4L, 1L, 6L, 8L, 8L, 1L), .Label = c("S", "T",
"U", "V", "W", "X", "Y", "Z"), class = "factor")), class = "data.frame", row.names = c(NA,
-13L))
本质上,我需要基于两个因子级别对唯一子集的df1
进行迭代。在每次迭代时,我需要为特定列找到df1
和df2
之间的匹配。在df2
和df1
之间找到的匹配中,我需要取df2
中的行的子集,并基于一组单独的标准找到与df3
的匹配,并返回与不同因子匹配的行。总结一下,具体到上面发布的数据帧:
- 对于第i个
df1$plot
和第j个df1$lepsp
,为df1$plot
和df2$plot
中匹配的条目在df2
中设置子集行。类似地,第二,为df1$lepsp
和df3$lepsp
中匹配的条目在df3
中的子集行 - 在来自上述步骤1的子集
df2
和df3
中,对于同样在df2$plantsp
中的df3$plantsp
的那些级别,返回df2
中的匹配行 - 返回一个数据帧,该数据帧对关联的第i个
df1$plot
和第j个df1$lepsp
以及df2
中基于步骤2中找到的标准匹配的关联行进行索引 - 在每个CCD_ 25内遍历CCD_ 24的所有级别
结果输出如下:
result<- structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), lepsp = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 5L
), .Label = c("lepA", "lepB", "lepC", "lepD", "lepE"), class = "factor"),
lepcount = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 1L, 3L), plantsp = structure(c(3L,
4L, 3L, 3L, 2L, 1L, 3L, 5L, 5L, 1L), .Label = c("S", "U",
"X", "Y", "Z"), class = "factor"), leafarea = c(1L, 5L, 1L,
1L, 2L, 15L, 20L, 13L, 13L, 15L)), class = "data.frame", row.names = c(NA,
-10L))
考虑到它有这样的嵌套结构,我很难想出一种将所有部分放在一起的方法,但我知道以下功能可能有用:
for (i in unique(levels(df1$plot)){
for( j in unique(levels(df1$lepsp)){
sub1<- df2[which(df1$plot %in% df2$plot),]
sub2<- df3[which(df2$lepsp %in% df3$lepsp),]
result <- data.frame(plot=unique(df1$plot),lepsp=unique(df1$lepsp),
plantsp=df2$plantsp,leafArea=df2$leafArea)}
return(result)
}
我们可以将数据集保存在list
中,并将merge
与Reduce
一起使用
out <- Reduce(function(...) merge(...), list(df1, df2, df3))
或使用tidyverse
library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(inner_join)