我正在清理一些文本数据,我遇到了一个与删除换行文本相关的问题。对于这个数据,文本中不仅有n
字符串,还有nn
字符串,以及编号的换行,如:n2
和nn2
。后者是我的问题。如何使用正则表达式删除这个?
我在r中工作,这里是一些示例文本和我使用的,到目前为止:
#string
string <- "There is a square in the apartment. nn4Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten.n2"
#code attempt
gsub("[r\n0-9]", '', string)
这个regex代码的问题是它删除了数字并与字母n
匹配。
我希望有以下输出:
"There is a square in the apartment. Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten."
我使用regexr作为参考。
像这样编写模式[r\n0-9]
匹配回车、字符或
n
之一或数字0-9
您可以编写匹配1个或多个回车或换行符的模式,后跟可选数字:
[rn]+[0-9]*
的例子:
string <- "There is a square in the apartment. nn4Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten.n2"
gsub("[rn]+[0-9]*", '', string)
输出[1] "There is a square in the apartment. Great laughs, which I hear from the other room. 4 laughs. Several. 9 times ten."
查看R演示。