r语言 - 如何从字符串中删除编号换行符?



我正在清理一些文本数据,我遇到了一个与删除换行文本相关的问题。对于这个数据,文本中不仅有n字符串,还有nn字符串,以及编号的换行,如:n2nn2。后者是我的问题。如何使用正则表达式删除这个?

我在r中工作,这里是一些示例文本和我使用的,到目前为止:

#string
string <- "There is a square in the apartment. nn4Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten.n2"
#code attempt
gsub("[r\n0-9]", '', string)

这个regex代码的问题是它删除了数字并与字母n匹配。

我希望有以下输出:

"There is a square in the apartment. Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten."

我使用regexr作为参考。

像这样编写模式[r\n0-9]匹配回车、字符n之一或数字0-9

您可以编写匹配1个或多个回车或换行符的模式,后跟可选数字:

[rn]+[0-9]*

的例子:

string <- "There is a square in the apartment. nn4Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten.n2"
gsub("[rn]+[0-9]*", '', string)

输出
[1] "There is a square in the apartment. Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten."

查看R演示。

最新更新