r-使用嵌套标准迭代返回三个数据帧之间的匹配

我有以下三个数据帧：

df1<- structure(list(plot = c(1L, 1L, 1L, 2L, 2L, 2L), lepsp = structure(c(1L, 
2L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), count = c(1L, 2L, 3L, 4L, 1L, 3L)), class = "data.frame", 
row.names = c(NA, -6L))
df2<-structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), plantsp = structure(c(12L, 13L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L), .Label = c("H", 
"I", "J", "K", "L", "M", "O", "P", "Q", "S", "U", "X", "Y", "Z"
), class = "factor"), leafArea = c(1L, 5L, 5L, 10L, 20L, 11L, 
12L, 8L, 1L, 5L, 10L, 15L, 20L, 12L, 13L, 2L)), class = "data.frame", row.names = c(NA, 
-16L))
df3<-structure(list(lepsp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), plantsp = structure(c(6L, 7L, 8L, 
6L, 5L, 2L, 3L, 4L, 1L, 6L, 8L, 8L, 1L), .Label = c("S", "T", 
"U", "V", "W", "X", "Y", "Z"), class = "factor")), class = "data.frame", row.names = c(NA, 
-13L))

本质上，我需要基于两个因子级别对唯一子集的df1进行迭代。在每次迭代时，我需要为特定列找到df1和df2之间的匹配。在df2和df1之间找到的匹配中，我需要取df2中的行的子集，并基于一组单独的标准找到与df3的匹配，并返回与不同因子匹配的行。总结一下，具体到上面发布的数据帧：

对于第i个df1$plot和第j个df1$lepsp，为df1$plot和df2$plot中匹配的条目在df2中设置子集行。类似地，第二，为df1$lepsp和df3$lepsp中匹配的条目在df3中的子集行
在来自上述步骤1的子集df2和df3中，对于同样在df2$plantsp中的df3$plantsp的那些级别，返回df2中的匹配行
返回一个数据帧，该数据帧对关联的第i个df1$plot和第j个df1$lepsp以及df2中基于步骤2中找到的标准匹配的关联行进行索引
在每个CCD_ 25内遍历CCD_ 24的所有级别

结果输出如下：

result<- structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), lepsp = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 5L
), .Label = c("lepA", "lepB", "lepC", "lepD", "lepE"), class = "factor"), 
lepcount = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 1L, 3L), plantsp = structure(c(3L, 
4L, 3L, 3L, 2L, 1L, 3L, 5L, 5L, 1L), .Label = c("S", "U", 
"X", "Y", "Z"), class = "factor"), leafarea = c(1L, 5L, 1L, 
1L, 2L, 15L, 20L, 13L, 13L, 15L)), class = "data.frame", row.names = c(NA, 
-10L))

考虑到它有这样的嵌套结构，我很难想出一种将所有部分放在一起的方法，但我知道以下功能可能有用：

for (i in unique(levels(df1$plot)){
for( j in  unique(levels(df1$lepsp)){
sub1<- df2[which(df1$plot %in% df2$plot),]
sub2<- df3[which(df2$lepsp %in% df3$lepsp),]
result <- data.frame(plot=unique(df1$plot),lepsp=unique(df1$lepsp),
plantsp=df2$plantsp,leafArea=df2$leafArea)}
return(result)
}

我们可以将数据集保存在list中，并将merge与Reduce一起使用

out <- Reduce(function(...) merge(...), list(df1, df2, df3))

或使用tidyverse

library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(inner_join)

相关内容

最新更新

热门标签：