r-如何匹配只包含任意顺序和任意数字的字符集的字符串



我似乎不知道如何匹配只包含"%的字符串&";,以及";§";,以及"#"以任何顺序并重复任何次数:

str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")

这种模式似乎让我接近了解决方案:

grepl("(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)", str, perl = T)
[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE

仅最后一个匹配!#%§是不正确的,因为字符串不仅包含字符集。我明白了为什么grepl匹配这个字符串:因为最后三个字符确实是字符集。因此,剩下的问题是如何限制与字符集的匹配。我尝试使用锚^$,但根本找不到匹配项:

grepl("^(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)$", str, perl = T)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

这里的解决方案是什么?

您可以使用:

^(?=.*%)(?=.*#)(?=.*§)[%#§]+$

演示

诀窍是确保字符串中的所有字符都是允许使用的字符。除了Lookahead之外,我们还使用了^[%#§]+$

细分:

  • ^-字符串的开头
  • (?=.*%)-确保"%"字符存在的正向前瞻
  • (?=.*#)-确保"#"字符存在的正向前瞻
  • (?=.*§)-确保"§"字符存在的积极前瞻
  • [%#§]+-匹配字符类中的一个或多个字符
  • $-字符串结束

另一种方法:通过前瞻性确保字符串仅由三个字符组成,然后使用受负前瞻性限制的捕获组和反向引用来匹配包含三个不同字符的字符串:

^(?=[%§#]+$)(.).*(?!1)(.).*(?!1|2).

请参阅正则表达式证明。

R证明:

str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")
grepl("^(?=[%§#]+$)(.).*(?!\1)(.).*(?!\1|\2).", str, perl = TRUE)

结果:[1] FALSE TRUE FALSE TRUE FALSE FALSE

解释

--------------------------------------------------------------------------------
^                        the beginning of the string
--------------------------------------------------------------------------------
(?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
[%§#]+                   any character of: '%', '§', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$                        before an optional n, and the end of
the string
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
(                        group and capture to 1:
--------------------------------------------------------------------------------
.                        any character except n
--------------------------------------------------------------------------------
)                        end of 1
--------------------------------------------------------------------------------
.*                       any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
1                       what was matched by capture 1
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
(                        group and capture to 2:
--------------------------------------------------------------------------------
.                        any character except n
--------------------------------------------------------------------------------
)                        end of 2
--------------------------------------------------------------------------------
.*                       any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
1                       what was matched by capture 1
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
2                       what was matched by capture 2
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
.                        any character except n

相关内容

最新更新