r语言 - 如何删除排除否定的标点符号?



假设我有以下句子:


s = c("I don't want to remove punctuation for negations. Instead, I want to remove only general punctuation. For example, keep I wouldn't like it but remove Inter's fan or Man city's fan.")

我希望有以下结果:

"I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan."

现在,如果我简单地使用下面的代码,我删除了否定中的's '和'。


s %>%  str_replace_all("['']s\b|[^[:alnum:][:blank:]@_]"," ")
"I don t want to remove punctuation for negations  Instead  I want to remove only general punctuation           For example  keep I wouldn t like it but remove Inter  fan or Man city  fan "

总而言之,我需要一个代码来删除一般的标点符号,包括" "除了我想保留原始格式的底片。

有人能帮我吗?

谢谢!

您可以使用前瞻性(?!t)测试,[:punct:]之后没有t

gsub("[[:punct:]](?!t)\w?", "", s, perl=TRUE)
#[1] "I don't want to remove punctuation for negations Instead I want to remove only general punctuation For example keep I wouldn't like it but remove Inter fan or Man city fan"

如果你想更严格,你可以在没有n之前再测试(?<!n)

gsub("(?<!n)[[:punct:]](?!t)\w?", "", s, perl=TRUE)

或者如果将其限制为仅't(感谢@chris-ruehlemann)

gsub("(?!'t)[[:punct:]]\w?", "", s, perl=TRUE)

或删除所有punct,但不删除''s:

gsub("[^'[:^punct:]]|'s", "", s, perl = TRUE)

相同,但使用look ahead:

gsub("(?!')[[:punct:]]|'s", "", s, perl = TRUE)

我们可以分两步完成,首先删除除"'"以外的所有标点符号,然后使用fixed match删除"'s":

gsub("'s", "", gsub("[^[:alnum:][:space:]']", "", s), fixed = TRUE)

最新更新