我有一个由多个字符串(在R中)组成的向量:
vec <- c("the cat the cat ran up the tree tree", "the dog ran up the up the tree",
"the squirrel squirrel ran up the tree")
我需要从每个单独的字符串中清除重复的单词。
所需输出:
"the cat ran up the tree"
"the dog ran up the tree"
"the squirrel ran up the tree"
我已经尝试了下面的解决方案:在R中删除字符串中的重复单词。然而,这只会将多个字符串合并成一个复杂的字符串。
我们可以使用gsub
来匹配两组单词和一个单词重复
gsub("((\w+\s+\w+\s?)|(\w+\s+))\1+", "\1", vec)
#[1] "the cat ran up the tree"
#[2] "the dog ran up the tree"
#[3] "the squirrel ran up the tree"