r-单词边界末尾和边缘之间的Regex差

正则表达式的R帮助文件显示

符号\\lt；和\>分别匹配单词的开头和结尾。符号\b与处的空字符串匹配单词的边缘

（一个单词的）结尾和边缘有什么区别？

b和</>之间的区别在于，b可以用于PCRE正则表达式模式（当您指定perl=TRUE时）和ICU正则表达式模式中（stringr包）。

> s = "no where nowhere"
> sub("\<no\>", "", s)
[1] " where nowhere"
> sub("\<no\>", "", s, perl=T) ## > and < do not work with PCRE
[1] "no where nowhere"
> sub("\bno\b", "", s, perl=T) ## b works with PCRE
[1] " where nowhere"
> library(stringr)
> str_replace(s, "\bno\b", "")
[1] " where nowhere"
> str_replace(s, "\<no\>", "")
[1] "no where nowhere"

<（总是代表单词的开头）和>（总是匹配单词的结尾）的优点是它们是明确的。CCD_ 8可以匹配这两个位置。

还有一件事需要考虑（参考）：

gsub和gregexpr的POSIX 1003.2模式在重复的单词边界（例如pattern = "b"）下不能正确工作。使用perl = TRUE进行此类匹配（但对于非ASCII输入，这可能无法正常工作，因为"单词"的含义取决于系统）。

相关内容

最新更新

热门标签：