R、正则表达式中的引号和可选通配符出现问题



给定此字符向量

columnsToKeep <- c("W","L","Customer Rate", "Diff% from Base",
"StoreOcc%", "COMPPS","Avail","Days in Unit",
"DSRC","Rec New Price", "Rec Rate Chg",
"intScheduledMoveOuts","TI30","BR1Yr",
"RLMI","NM7D","Last Rate Change %", "Occ%", 
"Last Rate Change Amt", "BR", "MoveInRate",
"newRate",
"lengthOfStay", "mnyRentAtMoveIn", 
"rentPriorToRateChange","mnyRentAtMoveOut","status")

我尝试此代码

d<-columnsToKeep[grepl(" ", columnsToKeep)]
cat(gsub("(\%?\w+\s+\w+\s*\w*)", '`\1`+', d))

产生

`Customer Rate`+ Diff% `from Base`+ `Days in Unit`+ `Rec New Price`+ `Rec Rate Chg`+ `Last Rate Change`+ % `Last Rate Change`+ Amt

但我想要这个

`Customer Rate`+ `Diff% from Base`+ `Days in Unit`+ `Rec New Price`+ `Rec Rate Chg`+ `Last Rate Change %` + `Last Rate Change Amt`

显然,我很难为 % 符号提供正确的正则表达式。

我不太确定您要用正则表达式做什么,但看起来您想在 d 中的每个名称周围加上反引号,然后用 + 连接它们。 获得它的两种方法是:

cat(sapply(d, function(s) { paste0("`", s, "`") }), sep="+ ")

cat(gsub("$", "`", gsub("^", "`", d)), sep="+ ")

我猜你只是在寻找单词之间至少包含一个空格的条目。你可以试一试:"([w%]+ [w%]+(?: [w%]+)*?)"

看起来你只是在尝试匹配字母和%,所以使用w可能有点危险(它也匹配0-9_(。您可以更具体地使用"([A-Za-z%]+ [A-Za-z%]+(?: [A-Za-z%]+)*?)"

另外,请注意,s不仅仅意味着空格 ((。它是所有空格字符的集合,因此它还将匹配换行符、回车符和制表符。如果要匹配空格,只需使用文字空格。

最新更新