r-使用嵌套标准迭代返回三个数据帧之间的匹配



我有以下三个数据帧:

df1<- structure(list(plot = c(1L, 1L, 1L, 2L, 2L, 2L), lepsp = structure(c(1L, 
2L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), count = c(1L, 2L, 3L, 4L, 1L, 3L)), class = "data.frame", 
row.names = c(NA, -6L))
df2<-structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), plantsp = structure(c(12L, 13L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L), .Label = c("H", 
"I", "J", "K", "L", "M", "O", "P", "Q", "S", "U", "X", "Y", "Z"
), class = "factor"), leafArea = c(1L, 5L, 5L, 10L, 20L, 11L, 
12L, 8L, 1L, 5L, 10L, 15L, 20L, 12L, 13L, 2L)), class = "data.frame", row.names = c(NA, 
-16L))
df3<-structure(list(lepsp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), plantsp = structure(c(6L, 7L, 8L, 
6L, 5L, 2L, 3L, 4L, 1L, 6L, 8L, 8L, 1L), .Label = c("S", "T", 
"U", "V", "W", "X", "Y", "Z"), class = "factor")), class = "data.frame", row.names = c(NA, 
-13L))

本质上,我需要基于两个因子级别对唯一子集的df1进行迭代。在每次迭代时,我需要为特定列找到df1df2之间的匹配。在df2df1之间找到的匹配中,我需要取df2中的行的子集,并基于一组单独的标准找到与df3的匹配,并返回与不同因子匹配的行。总结一下,具体到上面发布的数据帧:

  1. 对于第i个df1$plot和第j个df1$lepsp,为df1$plotdf2$plot中匹配的条目在df2中设置子集行。类似地,第二,为df1$lepspdf3$lepsp中匹配的条目在df3中的子集行
  2. 在来自上述步骤1的子集df2df3中,对于同样在df2$plantsp中的df3$plantsp的那些级别,返回df2中的匹配行
  3. 返回一个数据帧,该数据帧对关联的第i个df1$plot和第j个df1$lepsp以及df2中基于步骤2中找到的标准匹配的关联行进行索引
  4. 在每个CCD_ 25内遍历CCD_ 24的所有级别

结果输出如下:

result<- structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), lepsp = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 5L
), .Label = c("lepA", "lepB", "lepC", "lepD", "lepE"), class = "factor"), 
lepcount = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 1L, 3L), plantsp = structure(c(3L, 
4L, 3L, 3L, 2L, 1L, 3L, 5L, 5L, 1L), .Label = c("S", "U", 
"X", "Y", "Z"), class = "factor"), leafarea = c(1L, 5L, 1L, 
1L, 2L, 15L, 20L, 13L, 13L, 15L)), class = "data.frame", row.names = c(NA, 
-10L))

考虑到它有这样的嵌套结构,我很难想出一种将所有部分放在一起的方法,但我知道以下功能可能有用:

for (i in unique(levels(df1$plot)){
for( j in  unique(levels(df1$lepsp)){
sub1<- df2[which(df1$plot %in% df2$plot),]
sub2<- df3[which(df2$lepsp %in% df3$lepsp),]
result <- data.frame(plot=unique(df1$plot),lepsp=unique(df1$lepsp),
plantsp=df2$plantsp,leafArea=df2$leafArea)}
return(result)
}

我们可以将数据集保存在list中,并将mergeReduce一起使用

out <- Reduce(function(...) merge(...), list(df1, df2, df3))

或使用tidyverse

library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(inner_join)

最新更新