使用不情愿、贪婪和所有格量词捕捉组



我在Oracle的教程中练习java正则表达式。为了更好地理解贪婪、不情愿和所有格量词,我创建了一些例子。我的问题是这些量词在捕获组时是如何工作的。我不明白以这种方式使用量词,例如,不情愿的量词看起来好像根本不起作用。另外,我在网上搜索了很多,只看到像(.*?)这样的表达。为什么人们通常使用这种语法的量词,而不是像"(.foo)??"这样的东西?

下面是不情愿的例子:

Enter your regex: (.foo)??
Enter input string to search: xfooxxxxxxfoo
I found the text "" starting at index 0 and ending at index 0.
I found the text "" starting at index 1 and ending at index 1.
I found the text "" starting at index 2 and ending at index 2.
I found the text "" starting at index 3 and ending at index 3.
I found the text "" starting at index 4 and ending at index 4.
I found the text "" starting at index 5 and ending at index 5.
I found the text "" starting at index 6 and ending at index 6.
I found the text "" starting at index 7 and ending at index 7.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.
I found the text "" starting at index 10 and ending at index 10.
I found the text "" starting at index 11 and ending at index 11.
I found the text "" starting at index 12 and ending at index 12.
I found the text "" starting at index 13 and ending at index 13.

对于不情愿,不应该显示"xfoo"对于索引0和4 ?下面是物主格:

Enter your regex: (.foo)?+ 
Enter input string to search: afooxxxxxxfoo
I found the text "afoo" starting at index 0 and ending at index 4
I found the text "" starting at index 4 and ending at index 4.
I found the text "" starting at index 5 and ending at index 5.
I found the text "" starting at index 6 and ending at index 6.
I found the text "" starting at index 7 and ending at index 7.
I found the text "" starting at index 8 and ending at index 8.
I found the text "xfoo" starting at index 9 and ending at index 13.
I found the text "" starting at index 13 and ending at index 13.

对于所有格,它不应该只尝试一次输入吗?我真的很困惑,尤其是这个,因为我尝试了所有的可能性。

提前感谢!

regex引擎(基本上)逐个检查字符串的每个字符,从左边开始,试图使它们适合您的模式。它返回找到的第一个匹配项。

应用于子模式的不情愿量词意味着regex引擎将优先考虑(例如,首先尝试)下面的子模式。

看看.*?baabab上一步步发生了什么:

aabab # we try to make '.*?' match zero '.', skipping it directly to try and 
^     # ... match b: that doesn't work (we're on a 'a'), so we reluctantly 
      # ... backtrack and match one '.' with '.*?'
aabab # again, we by default try to skip the '.' and go straight for b:
 ^    # ... again, doesn't work. We reluctantly match two '.' with '.*?'
aabab # FINALLY there's a 'b'. We can skip the '.' and move forward:
  ^   # ... the 'b' in '.*?b' matches, regex is over, 'aab' is a general match

在您的模式中,没有等同于b的。(.foo)是可选的,引擎优先考虑模式的以下部分。

nothing,它匹配一个空字符串:找到一个整体匹配,它总是一个空字符串。


关于所有格量词,你对它们的作用感到困惑。它们对匹配的数量没有直接的关联:它不清楚你用来应用正则表达式的聊天工具,但它会寻找全局匹配,这就是为什么它不会在第一次匹配时停止。

有关它们的更多信息请参阅http://www.regular-expressions.info/possessive.html。

此外,正如HamZa指出的,https://stackoverflow.com/a/22944075正在成为一个很好的参考正则表达式相关的问题。

最新更新