搜索和提取字符串,方法是将每个值与R中列中的所有值进行比较(甚至值的顺序不同)



我有一列,我试图将1值与所有其他值进行比较,直到最后一个值并提取匹配的字符串,即使值的顺序不同。20和24)输出数据帧将是df_out。

Col_1 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")
df_input = data.frame(Col_1)

输出数据帧如下

Col_1 = c("AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF",  "AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","MN,OP","MN,OP","MN,OP","MN,OP","MN,OP", "AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP",
"OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB")
Col_2 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB",  "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB",  "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB",
"AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")

match = c("Complete Match","AB,CD","NO Matching","AB","AB","AB,CD","Complete Match","NO Matching","AB","AB","NO Matching","NO Matching","Complete Match",
"MN,OP","MN,OP","AB","AB","MN,OP","Complete Match","Complete Match","AB","AB","MN,OP","Complete Match","Complete Match")

df_out = data.frame(Col_1,Col_2,match)

我已经用grepl试过了,但是没有得到想要的输出。

这是一个(有点混乱的)解决方案:

funcmatch <- function(a, b) {
ma <- match(a, b)
if (all(is.na(ma)))
return("NO MATCH") 
else if (sum(!is.na(ma)) == length(b)) 
return("COMPLETE MATCH") 
else 
return(paste0(a[na.omit(ma)], collapse = ",")) 
}
mapply(funcmatch, strsplit(Col_1, ","), strsplit(Col_2, ","))

解决方案:

[1] "COMPLETE MATCH" "AB,CD,EF"       "NO MATCH"      
[4] "AB"             "EF"             "COMPLETE MATCH"
[7] "COMPLETE MATCH" "NO MATCH"       "AB"            
[10] "EF"             "NO MATCH"       "NO MATCH"      
[13] "COMPLETE MATCH" "OP,NA"          "OP,MN"         
[16] "AB"             "AB"             "COMPLETE MATCH"
[19] "COMPLETE MATCH" "COMPLETE MATCH" "OP"            
[22] "OP"             "COMPLETE MATCH" "COMPLETE MATCH"
[25] "COMPLETE MATCH"