使用正则表达式在折叠的单词之间插入空格

我正在R中的choropleth工作，需要能够与match.map()匹配状态名称。我使用的数据集将多词名称放在一起，比如NorthDakota和DistrictOfColumbia。

如何使用正则表达式在上下字母序列之间插入空格?我已经成功地添加了一个空格，但是没有能够保留指示空格位置的字母。

places = c("NorthDakota", "DistrictOfColumbia")
gsub("[[:lower:]][[:upper:]]", " ", places)
[1] "Nort akota"       "Distric  olumbia"

使用括号捕获匹配的表达式，然后使用n (R中的\n)检索它们:

places = c("NorthDakota", "DistrictOfColumbia")
gsub("([[:lower:]])([[:upper:]])", "\1 \2", places)
## [1] "North Dakota"         "District Of Columbia"

您希望使用捕获组来捕获匹配的上下文，以便您可以在替换调用中引用每个匹配的组。

如果需要访问组，请在\前面加两个反斜杠，然后再加#。

> places = c('NorthDakota', 'DistrictOfColumbia')
> gsub('([[:lower:]])([[:upper:]])', '\1 \2', places)
# [1] "North Dakota"         "District Of Columbia"

另一种方法是，通过使用perl=T来打开PCRE，并使用lookaround断言。

> places = c('NorthDakota', 'DistrictOfColumbia')
> gsub('[a-z]\K(?=[A-Z])', ' ', places, perl=T)
# [1] "North Dakota"         "District Of Columbia"

K转义序列重置报告匹配的起点，并且不再包括任何先前消耗的字符。基本上(会把它之前匹配的所有东西都扔掉。

)

[a-z]       # any character of: 'a' to 'z'
K          # 'K' (resets the starting point of the reported match)
(?=         # look ahead to see if there is:
  [A-Z]     #   any character of: 'A' to 'Z'
)           # end of look-ahead

相关内容

最新更新

热门标签：