R函数,用于查找2个字符串之间匹配的至少2个单词(应用于2个字符串向量)



我有两组字符串。Char和Char2。我想知道Char是否至少包含Char2中的2个单词(任何两个单词都可以匹配)。我还没有进入"至少2个单词"的部分,但我必须首先弄清楚每个字符串中任何单词的匹配情况。如有任何帮助,我们将不胜感激。

我尝试了几种不同的方式使用stringr包。请参见下文。我尝试使用与Robert在这个问题中回答的类似的解决方案:用dplyr和stringr检测多个字符串

shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"
shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"
shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)

# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))

我得到以下结果:

sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
good apples bag of apples  bag of sugar       milk x2 
FALSE         FALSE          TRUE         FALSE 

str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE  TRUE FALSE

然而,我正在寻找这些结果:

假真真真

1) FALSE,因为只有1个单词匹配2) TRUE,因为两者中的"袋"3) TRUE,因为两者中的"袋"4) TRUE,因为两个中都有"牛奶x2">

这里有一个函数可以帮助

match_test <- function (string1, string2) {
words1 <- unlist(strsplit(string1, ' '))
words2 <- unlist(strsplit(string2, ' '))
common_words <- intersect(words1, words2)
length(common_words) > 1
}

以下是的示例

string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE  TRUE  TRUE  TRUE

最新更新