如何创建一个表,其中第一个表中的行也与R中第二个表的行中的3列相匹配



我需要创建第三个表(df3(,其中第一个表(df1(的行与第二个表(DP2(中5列中的3列的行值匹配。两个起始表df1和df2不具有相同数量的行。

示例:

df1                                                  df2 
chain   freq   color  length  type1  type2       chain   freq    color  length  type1  type2
AC       24     red    100      C      V2         BD      45      blue   73      C      G5
BD       57     green   87      C      G5         YJ      57      green  78      N      Y6
OP       83     yellow  68      R      Q9         TP       8      orange 98      Y      P2
TP       28     blue    74      Y      P2         HP      50      white  87      A      U9
HP       23     yellow  39      A      U9         ZS      87      red    98      P      N8
XC       8      green  98      T      N8

生成的表中的行在df1中与df2中的chaintype1type2列相匹配。在这个例子中,它看起来像这样:

df3                                
chain   freq    color  length  type1  type2
BD       57     green   87      C      G5
TP       28     blue    74      Y      P2
HP       23     yellow  39      A      U9

我正在尽可能避免循环。我一直在研究dplyr的功能,但我还不太熟悉这个包。任何想法都值得赞赏。

我们可以使用semi_join

library(dplyr)
semi_join(df1, df2, by = c('chain', 'type1', 'type2'))
#   chain freq  color length type1 type2
#1    BD   57  green     87     C    G5
#2    TP   28   blue     74     Y    P2
#3    HP   23 yellow     39     A    U9

数据

df1 <- structure(list(chain = c("AC", "BD", "OP", "TP", "HP"), freq = c(24L, 
57L, 83L, 28L, 23L), color = c("red", "green", "yellow", "blue", 
"yellow"), length = c(100L, 87L, 68L, 74L, 39L), type1 = c("C", 
"C", "R", "Y", "A"), type2 = c("V2", "G5", "Q9", "P2", "U9")), class = "data.frame", row.names = c(NA, 
-5L))
df2 <- structure(list(chain = c("BD", "YJ", "TP", "HP", "ZS", "XC"), 
freq = c(45L, 57L, 8L, 50L, 87L, 8L), color = c("blue", "green", 
"orange", "white", "red", "green"), length = c(73L, 78L, 
98L, 87L, 98L, 98L), type1 = c("C", "N", "Y", "A", "P", "T"
), type2 = c("G5", "Y6", "P2", "U9", "N8", "N8")), 
class = "data.frame", row.names = c(NA, 
-6L))

相关内容