我需要创建第三个表(df3(,其中第一个表(df1(的行与第二个表(DP2(中5列中的3列的行值匹配。两个起始表df1和df2不具有相同数量的行。
示例:
df1 df2
chain freq color length type1 type2 chain freq color length type1 type2
AC 24 red 100 C V2 BD 45 blue 73 C G5
BD 57 green 87 C G5 YJ 57 green 78 N Y6
OP 83 yellow 68 R Q9 TP 8 orange 98 Y P2
TP 28 blue 74 Y P2 HP 50 white 87 A U9
HP 23 yellow 39 A U9 ZS 87 red 98 P N8
XC 8 green 98 T N8
生成的表中的行在df1中与df2中的chain、type1和type2列相匹配。在这个例子中,它看起来像这样:
df3
chain freq color length type1 type2
BD 57 green 87 C G5
TP 28 blue 74 Y P2
HP 23 yellow 39 A U9
我正在尽可能避免循环。我一直在研究dplyr的功能,但我还不太熟悉这个包。任何想法都值得赞赏。
我们可以使用semi_join
library(dplyr)
semi_join(df1, df2, by = c('chain', 'type1', 'type2'))
# chain freq color length type1 type2
#1 BD 57 green 87 C G5
#2 TP 28 blue 74 Y P2
#3 HP 23 yellow 39 A U9
数据
df1 <- structure(list(chain = c("AC", "BD", "OP", "TP", "HP"), freq = c(24L,
57L, 83L, 28L, 23L), color = c("red", "green", "yellow", "blue",
"yellow"), length = c(100L, 87L, 68L, 74L, 39L), type1 = c("C",
"C", "R", "Y", "A"), type2 = c("V2", "G5", "Q9", "P2", "U9")), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(chain = c("BD", "YJ", "TP", "HP", "ZS", "XC"),
freq = c(45L, 57L, 8L, 50L, 87L, 8L), color = c("blue", "green",
"orange", "white", "red", "green"), length = c(73L, 78L,
98L, 87L, 98L, 98L), type1 = c("C", "N", "Y", "A", "P", "T"
), type2 = c("G5", "Y6", "P2", "U9", "N8", "N8")),
class = "data.frame", row.names = c(NA,
-6L))