具有负前瞻性的Perl正则表达式的行为出乎意料



我试图匹配/ezmlm-(除'weed'或'return'之外的任何单词)s+/与正则表达式。下面演示了一个foreach循环,它做了正确的事情,而一个尝试的regex几乎做了:

#!/usr/bin/perl
use strict;
use warnings;
my @tests = (
    {  msg => "want 'yes', string has ezmlm, but not weed or return",
       str => q[|/usr/local/bin/ezmlm-reject '<snip>'],
    },
    {  msg => "want 'yes', array  has ezmlm, but not weed or return",
       str => [ <DATA> ],
    },
    {  msg => "want 'no' , has ezmlm-weed",
       str => q[|/usr/local/bin/ezmlm-weed '<snip>'],
    },
    {  msg => "want 'no' , doesn't have ezmlm-anything",
       str => q[|/usr/local/bin/else '<snip>'],
    },
    {  msg => "want 'no' , ezmlm email pattern",
       str => q[crazy/but/legal/ezmlm-wacky@example.org],
    },
);
print "foreach regexn";
foreach ( @tests ) {
    print doit_fe( ref $_->{str} ? @{$_->{str}} : $_->{str} ) ? "yes" : "no";
    print "t";
    print doit_re( ref $_->{str} ? @{$_->{str}} : $_->{str} ) ? "yes" : "no";
    print "t<--- $_->{msg}n";
};
# for both of the following subs:
#   @_ will contain one or more lines of data
#   match the pattern /ezmlm-(any word except 'weed' or 'return')s+/
sub doit_fe {
    my $has_ezmlm = 0;
    foreach ( @_ ) {
        next if $_ !~ m/ezmlm-(.*?)s/;
        return 0 if $1 eq 'weed' or $1 eq 'return';
        $has_ezmlm++;
    };
    return $has_ezmlm;
};
sub doit_re { return grep /ezmlm-(?!weed|return)/, @_; };
__DATA__
|/usr/local/bin/ezmlm-reject '<snip>'
|/usr/local/bin/ezmlm-issubn '<snip>'
|/usr/local/bin/ezmlm-send '<snip>'
|/usr/local/bin/ezmlm-archive '<snip>'
|/usr/local/bin/ezmlm-warn '<snip>'

示例程序的输出如下:

foreach regex
yes yes <--- want 'yes', string has ezmlm, but not weed or return
yes yes <--- want 'yes', array  has ezmlm, but not weed or return
no  no  <--- want 'no' , has ezmlm-weed
no  no  <--- want 'no' , doesn't have ezmlm-anything
no  yes <--- want 'no' , ezmlm email pattern

在最后一个实例中,正则表达式失败,匹配了一个愚蠢但合法的电子邮件地址。如果我修改正则表达式,在负向前看模式后面放置一个s,如下所示:

grep /ezmlm-(?!weed|return)s+/

正则表达式根本无法匹配。我认为这与消极模式的运作方式有关。我试着让否定不贪婪,但似乎有一些教训被埋在"perldoc perlre"我漏掉了。有可能用一个正则表达式来做到这一点吗?

负向前看zero-width这意味着正则表达式

/ezmlm-(?!weed|return)s+/

只匹配紧跟在"ezmlm-"后面的一个或多个空格字符。

模式
/ezmlm-(?!weed|return)/

将匹配

"crazy/but/legal/ezmlm-wacky@example.org"

,因为它包含"ezmlm-"而不是"weedy""return"

/ezmlm-(?!weed|return)S+s+/

其中S+是一个或多个非空格字符(或者如果您想拒绝电子邮件地址,即使后面有空格,也可以使用[^@s]+)。

最新更新