我有两个数据帧:
df1:
word1 previousWord
a na
b a
c b
另一个数据帧看起来像这个
df2: this contains more pairs than exist in df1. It contains every combo possible
word1 previousWord Score
a a 1
a b .5
a c .9
b a .5
b b 1
b c .2
c a .9
c b .2
c c 1
我想找出df2中来自df1的对(即b-a,c-b(的时间,并复制df2中的分数,并将其添加到df1中的新列中。
例如,输出如下:
word1 previousWord Score
a na na
b a .5
c b .2
这是我尝试过的,但它似乎从df1中删除了我的许多数据。改变顺序并没有消除这个问题。
df3<-merge(df2, df1, by = c("word1", "previousWord"))
非常感谢您的帮助。
您可以在此处从dplyr
使用left_join()
。
library(dplyr)
df3<- left_join(df1, df2, by = c("word1", "previousWord"))
输出
word1 previousWord Score
1 a <NA> NA
2 b a 0.5
3 c b 0.2
数据
df1 <- structure(list(word1 = c("a", "b", "c"), previousWord = c(NA,
"a", "b")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(word1 = c("a", "a", "a", "b", "b", "b", "c", "c",
"c"), previousWord = c("a", "b", "c", "a", "b", "c", "a", "b",
"c"), Score = c(1, 0.5, 0.9, 0.5, 1, 0.2, 0.9, 0.2, 1)), class = "data.frame", row.names = c(NA,
-9L))