正则表达式超时



我正在尝试匹配类似的东西

foo: anything after the colon can be matched with (.*)+
foo.bar1.BAZ: balh5317{}({}(

这是我正在使用的正则表达式:

/^((?:(?:(?:[A-Za-z_]+)(?:[0-9]+)?)+[.]?)+)(?:s)?(?::)(?:s)?((?:.*)+)$/

请原谅不匹配的组和额外的参数,这是从构建器类编译的

这适用于示例。当我尝试输入这样的字符串时,就会出现问题:

foo.bar.baz.beef.stew.ect.and.forward

我需要能够检查这样的字符串,但是正则表达式引擎每次都会在一定量的foo.s 后超时或无穷大(据我所知)。

我确信这是我可以解决的一个逻辑问题,但不幸的是,我远未掌握正则表达式,我希望更有经验的用户能够阐明如何使其更有效率。

另外,以下是我需要匹配的内容的更详细描述:

Property Name: can contain A-z, numbers, and underscores but can't start with a number
<Property Name>.<Property Name>.<Prop...:<Anything after the colon>

谢谢你的时间!

从你的正则表达式开始:

^((?:(?:(?:[A-Za-z_]+)(?:[0-9]+)?)+[.]?)+)(?:s)?(?::)(?:s)?((?:.*)+)$

^                                  # Anchors to the beginning to the string.
(                                  # Opens CG1
(?:                            # Opens NCG
(?:                        # Opens NCG
(?:                    # Opens NCG
[A-Za-z_]+         # Character class (any of the characters within)
)                      # Closes NCG
(?:                    # Opens NCG
[0-9]+             # Character class (any of the characters within)
)?                     # Closes NCG
)+                         # Closes NCG
[.]?                      # Character class (any of the characters within)
)+                             # Closes NCG
)                                  # Closes CG1
(?:                                # Opens NCG
s                             # Token: s (white space)
)?                                 # Closes NCG
(?:                                # Opens NCG
:                             # Literal :
)                                  # Closes NCG
(?:                                # Opens NCG
s                             # Token: s (white space)
)?                                 # Closes NCG
(                                  # Opens CG2
(?:                            # Opens NCG
.*                         # . denotes any single character, except for newline
)+                             # Closes NCG
)                                  # Closes CG2
$                                  # Anchors to the end to the string.

我将[0-9]转换为d,只是为了更容易阅读(两者都匹配相同的东西)。我还删除了许多非捕获组,因为它们并没有真正被使用。

^((?:(?:[A-Za-z_]+d*)+.?)+)s?:s?((?:.*)+)$

我还将s和 .* 合并到[sS]*中,但看到它后面跟着一个+符号,我删除了该组并只做了[sS]

^((?:(?:[A-Za-z_]+d*)+.?)+)s?:([sS]+)$
^

现在我不确定克拉上面的+应该做什么。我们可以删除它,从而删除围绕它的非捕获组。

^((?:[A-Za-z_]+d*.?)+)s?:([sS]+)$

解释:

^                          # Anchors to the beginning to the string.
(                          # Opens CG1
(?:                    # Opens NCG
[A-Za-z_]+         # Character class (any of the characters within)
d*                # Token: d (digit)
.?                # Literal .
)+                     # Closes NCG
)                          # Closes CG1
s?                        # Token: s (white space)
:                         # Literal :
(                          # Opens CG2
[sS]+                # Character class (any of the characters within)
)                          # Closes CG2
$                          # Anchors to the end to the string.

现在,如果要处理多行,您可能希望将[sS]+更改回.*。关于这一点有几种不同的选择,但使用的语言很重要。

老实说,我分步完成了这项工作,但最大的问题是(?:.*)+这是告诉引擎match 0 or more characters 1 or more times灾难性的回溯(正如 xufox 在评论中链接的那样)。

生成的正则表达式以及您的原始正则表达式也允许以.结尾的变量 我会使用更像这样的东西,您的正则表达式真的离它不远。

这将匹配像 foo.ba5r 这样的名称,如果这是可以接受的,你之前的正则表达式不会。

^([A-Za-z_]w*(?:.[A-Za-z_]+w*)*)s?:([sS]+)$

解释:

^                          # Anchors to the beginning to the string.
(                          # Opens CG1
[A-Za-z_]              # Character class (any of the characters within)
w*                    # Token: w (a-z, A-Z, 0-9, _)
(?:                    # Opens NCG
.                 # Literal .
[A-Za-z_]          # Character class (any of the characters within)
w*                # Token: w (a-z, A-Z, 0-9, _)
)*                     # Closes NCG
)                          # Closes CG1
s?                        # Token: s (white space)
:                         # Literal :
(                          # Opens CG2
[sS]+                # Character class (any of the characters within)
)                          # Closes CG2
$                          # Anchors to the end to the string.

相关内容

  • 没有找到相关文章

最新更新