%，但看起来类似的字符串R

我想在另一个df的df列中搜索哪些字符串相似，例如在df1中，我有这样的：

nombres
Acesco Corporation
Exito S.A
AMI 
Renault

在一个df2中，我发现了这个：

nombres
Acesco
Exito 
AMI 
Renault

我想要一个类似%in%的函数，它给出这样的输出：Acesco, Exito, AMI

我们可以使用：

txt1 <- c('nombres',
'Acesco Corporation',
'Exito S.A',
'AMI ',
'Renault')
txt2 <- c(
'nombres',
'Acesco',
'Exito',
'AMI',
'Renault')
dist_matrix <- data.frame(t(adist(txt1, txt2))) # columns correspond to txt1 after transposing
txt2[sapply(dist_matrix, which.min)]
[1] "nombres" "Acesco"  "Exito"   "AMI"     "Renault"

其中adist计算两个字符串之间的距离。

两个字符串之间的(广义(Levenstein(或编辑(距离t是插入、删除的最小可能加权数以及将s转换为t 所需的替换

我发现了一种可能有效的方法，不如@gaut中的方法好，但可能有效

lapply(df1, function(x) grep(x, df2))

这给出了它在df1中的位置，在df2中的位置。

希望有帮助！

我只想通过获取矩阵的对角线来扩展@lasagna的答案，然后将其绑定到原始数据帧，以便在接下来的步骤中使用。。。

df<-dist_matrix %>% as.matrix()
mydists<-diag(df)
dist_matrix$mydist<-mydists
dist_matrix

相关内容

最新更新

热门标签：