如何构建与多字符模式的第一个示例匹配的 Ruby 正则表达式?

我有一个多行的 ruby 代码片段，我需要从中提取参数到一个特定的方法，在这个例子中foo方法：

code = "qux = define {n  foo an arbitrary statementn    that could go on for n    several linesn  bar 42n  baz43n}"

我想从中提取an arbitrary statementn that could go on for n several lines.为此，我想捕获foo和/^s{2}w+/的第一个实例之间的所有内容，标记下一个方法及其参数的开始。

我未能做到这一点(在code.match(<example regex here>)[1]内)包括：

/foo(.*)s{2}w+/m
/^s+foo(.*)^s{2}w+/m
/ns+foo(.*)ns{2}w+/m
/ns+foo(.*)ns{2}w+?/m

等等。似乎没有一个返回我正在寻找的"语句"模式。懒惰/贪婪运算符有一些影响，但永远不会消除目标foo(.*)模式之后的所有字符串。

有什么建议吗？

r = /
(?<=     # begin a positive lookbehind
b     # match a (zero-width) word break
foo    # match string
[ ]    # match a space
)        # close positive lookbehind
.*?      # match zero or more chars, non-greedily
(?=      # begin a positive lookahead
n     # match newline char
[ ]{2} # match two spaces
w     # match a word char
)        # close positive lookahead
/xm      # free-spacing and multiline modes
code[r]
#=> "an arbitrary statementn    that could go on for n    several lines"

正则表达式的常规表达式如下。

/(?<=bfoo ).*?(?=n  w)/m

请注意，在自由间距模式下，我将空格与包含一个空格 ([ ]) 的字符类匹配。如果我使用空格，就像我在上面的正则表达式中所做的那样，它们就会被删除，因为我使用自由间距模式来定义正则表达式，这会忽略空格。

捕获返回的字符串的表达式必须是非贪婪的(.*?而不是.*)，这一点很重要。如果它是贪婪的(.*)，我们将得到一个错误的结果，如下面的例子所示：

str = "foo oh myn  an  bn  c"
str[r]
#=> "oh my"
str[/(?<=bfoo ).*(?=n  w)/m]
#=> "oh myn  an  b"

在贪婪的情况下，.*尽可能多地狼吞虎咽，直到涉及到积极的前瞻性(?=n w)的最后一个可能的匹配，即"n c"。

分词符(b)是为了确保我们不匹配，例如，"snafoo"。它要求"foo"前面有一个非单词字符，或者是字符串的第一个字符。

编写正则表达式的另一种方法如下。

code[/bfoo K.*?(?=n  w)/m]
#=> "an arbitrary statementn    that could go on for n    several lines"

K可以读作"返回匹配的字符串时丢弃到目前为止匹配的所有内容"。也就是说，K前面的部分必须匹配;它只是不用于形成返回的匹配字符串。

编写正则表达式的最后一种方法是使用捕获组。

code[/bfoo (.*?)n  w/m, 1]
#=> "an arbitrary statementn    that could go on for n    several lines"

感兴趣的字符串在捕获组1中捕获，然后在 String#[] 的可选第二个参数中返回。

最后，请注意，接近结尾的w与w+具有相同的效果。

相关内容

最新更新

热门标签：