>我有一些正则表达式的组,并希望为每个正则表达式匹配当前行,如果匹配成功,则调用一些将匹配组作为参数的函数。
my %regexps = (
"a" => qr/^(a)s*(b)/o,
"b" => qr/^(c)s*(d)/o,
"c" => qr/^(e)s*(f)/o,
);
sub call_on_match {
my $actions = shift;
# ... some setup actions for $_
while (my ($regexp, $func) = each(%$actions) ) {
if (my @matches = /$regexp/){
$func->(@matches);
}
}
}
call_on_match({
$regexps{"a"} => &some_funca,
$regexps{"b"} => &some_funcb,
$regexps{"c"} => &some_funcc,
})
问题出在表达式my @matches = /$regexp/
,它执行大约 110k 次,编译总共需要大约 1 秒(此行的典型探查器输出:# spent 901ms making 107954 calls to main::CORE:regcomp, avg 8µs/call
.第一个猜测是删除额外的正则表达式斜杠,以防它使perl认为它是新的正则表达式并且必须编译。我用了my @matches = ($_ =~ $regexp)
,但没有成功。在这种情况下,有没有其他方法可以使 perl 不重新编译 qr'ed 正则表达式?
UPD:我用数组替换了哈希(如[$regexps{"a"}, &some_funca]
):
foreach my $group (@$actions){
my ($regexp, $func) = @$group;
if (my @matches = ($_ =~ $regexp)){
$func->(@matches);
}
}
现在它编译得更快,但编译并没有消失:# spent 51.7ms making 107954 calls to main::CORE:regcomp, avg 479ns/call
我建议你在两个哈希中使用 ID 作为键,就像这样
use strict;
use warnings;
my %regexps = (
a => qr/^(a)s*(b)/,
b => qr/^(c)s*(d)/,
c => qr/^(e)s*(f)/,
);
sub call_on_match {
my ($actions) = @_;
# ... some setup actions for $_
while (my ($regexp_id, $func) = each %$actions) {
if (my @matches = $_ =~ $regexps{$regexp_id}) {
$func->(@matches);
}
}
}
call_on_match(
{
a => &some_funca,
b => &some_funcb,
c => &some_funcc,
}
);