Perl，预编译正则表达式组的匹配数组

>我有一些正则表达式的组，并希望为每个正则表达式匹配当前行，如果匹配成功，则调用一些将匹配组作为参数的函数。

my %regexps = (
    "a" => qr/^(a)s*(b)/o,
    "b" => qr/^(c)s*(d)/o,
    "c" => qr/^(e)s*(f)/o,
);
sub call_on_match {
    my $actions = shift;
    # ... some setup actions for $_
    while (my ($regexp, $func) = each(%$actions) ) {
        if (my @matches = /$regexp/){
            $func->(@matches);
        }
    }
}
call_on_match({ 
    $regexps{"a"} => &some_funca,
    $regexps{"b"} => &some_funcb,
    $regexps{"c"} => &some_funcc,
})

问题出在表达式my @matches = /$regexp/，它执行大约 110k 次，编译总共需要大约 1 秒（此行的典型探查器输出：# spent 901ms making 107954 calls to main::CORE:regcomp, avg 8µs/call .第一个猜测是删除额外的正则表达式斜杠，以防它使perl认为它是新的正则表达式并且必须编译。我用了my @matches = ($_ =~ $regexp)，但没有成功。在这种情况下，有没有其他方法可以使 perl 不重新编译 qr'ed 正则表达式？

UPD：我用数组替换了哈希（如[$regexps{"a"}, &some_funca]）：

foreach my $group (@$actions){
    my ($regexp, $func) = @$group;
    if (my @matches = ($_ =~ $regexp)){
           $func->(@matches);
    }
}

现在它编译得更快，但编译并没有消失：# spent 51.7ms making 107954 calls to main::CORE:regcomp, avg 479ns/call

我建议你在两个哈希中使用 ID 作为键，就像这样

use strict;
use warnings;
my %regexps = (
  a => qr/^(a)s*(b)/,
  b => qr/^(c)s*(d)/,
  c => qr/^(e)s*(f)/,
);
sub call_on_match {
  my ($actions) = @_;
  # ... some setup actions for $_
  while (my ($regexp_id, $func) = each %$actions) {
    if (my @matches = $_ =~ $regexps{$regexp_id}) {
      $func->(@matches);
    }
  }
}
call_on_match(
  {
    a => &some_funca,
    b => &some_funcb,
    c => &some_funcc,
  }
);

相关内容

最新更新

热门标签：