我想合并两个数据帧
a<- data.frame(x=c(1,4,6,8,1,6,7,2),ID=c("132","14.","732","2..","132","14.","732","2.."),year=c(1,1,1,1,2,2,2,2))
b<- data.frame(y=c(2,7,5,5,1,1,2,3),ID=c("132","144","732","290","132","144","732","290"),year=c(1,1,1,1,2,2,2,2))
我想要合并两个数据帧的ID变量在数据集a中不完全已知。我还想按年合并。它们是已知的,直到一个完全识别的正则表达式。注意,是的一对一匹配。在本例中,不是查找ID "1.."在数据集中,这样就没有歧义匹配。
我想要这样的东西:
output<-data.frame(y=c(2,7,5,5,1,1,2,3),x=c(1,4,6,8,1,6,7,2),ID=c("132","144","732","290","132","144","732","290"), year=c(1,1,1,1,2,2,2,2))
我试图用substr删除正则表达式部分,然后在合并中使用starts_with,但它不起作用。
我得到以下错误信息
Coercing pattern to a plain character vector
df_complete <- regex_inner_join(b,a, by=c("ID","year"))
感谢堆栈溢出…
@jblood94的回复
以a
和b
为data.table
s:a[, regex_inner_join(b[year == .BY], .SD, by = "ID"), year]
-jblood94
您可能想要使用fuzzyjoin
包,然后您可以直接使用regex_inner_join()
:
fuzzyjoin::regex_inner_join(b,a, by="ID") %>% select(x,y,ID=ID.x)
输出:
x y ID
1 1 2 132
2 4 7 144
3 6 5 732
4 8 5 290