在 R 中的同一句子中搜索一组单词



我正在尝试在同一句子的上下文中搜索一组单词。例如,我试图找出单词"not""sugar"是否都存在于单个句子的上下文中

string = c(
"I do not like sugar. However, I like coffee.", 
"I like sugar. But I do not like coffee.")

两个文本都带有单词"not""sugar",但只有第一个文本在同一话中带有"not""sugar"这两个单词。在第二篇课文中,"not""sugar"存在于不同的句子中。

我想为第一个文本返回TRUE,为第二个文本返回FALSE

我试过grepl("not\ssugar", string)

你的尝试非常接近....此[^\.,!?:;]允许任何字符,除了 likesugar 之间的标点符号。

string = c(
  "I do not like sugar. However, I like coffee.", 
  "I like sugar. But I do not like coffee.",
  "I do not like coffee. But I love sugar.")
grepl("not[^\.,!?:;]*sugar", string)

这是一种可能的方法,当然不是最有效的,也不是更容易阅读的(!不过好处是,它甚至为您提供了真实的句子。我已经隔离了要测试的单词集和代码,以便您能够测试任意数量的单词的共现。

string = c(
  "I do not like sugar. However, I like coffee.", 
  "I like sugar. But I do not like coffee.")
checkwords=lapply(string,
FUN=function(str,words=c("sugar","not"))
{
  sapply(strsplit(str,"\.")[[1]],FUN=function(el){
    any(all(sapply(words,
           FUN=function(wd)grepl(wd,el))))
     })
})
# yes this can be a one line instruction...
checkwords
 [[1]]
     I do not like sugar  However, I like coffee 
               TRUE                   FALSE 
 [[2]]
              I like sugar  But I do not like coffee 
                     FALSE                     FALSE 

然后检查初始向量的每个元素是否存在至少一个 TRUE,string

sapply(checkwords, any)
[1]  TRUE FALSE

最新更新