如何在perl中匹配多个项目

my $text ='<span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small>'

if ($text =~ m/<span>by <small class="author" itemprop="author">(.+?)</small>/ig){
$author = $1;
$authorcount{$author} +=1;
}
$authorcounttxt = "authorcount.txt";
open (OUTPUT3, ">$authorcounttxt");
foreach $author (sort { $authorcount{$b} <=> $authorcount{$a} } keys %authorcount){
print OUTPUT3 ("$authortt$authorcount{$author}n");
}
close (OUTPUT3);

期望的输出是:

J.K. Rowling 3

然而我只得到:

J.K. Rowling 1

if ($text =~ m/.../ig){
$author = $1;
$authorcount{$author} +=1;

这是一个如果语句，这意味着内部块最多只能被输入一次，即，如果有第一个匹配。您可能想要在时执行语句为每个匹配进入内部块:

while ($text =~ m/.../ig){
$author = $1;
$authorcount{$author} +=1;

用while替换您的if以迭代您的regex匹配的所有匹配，而不仅仅是第一个:

while ($text =~ m/<span>by <small class="author" itemprop="author">(.+?)</small>/ig){
$author = $1;
$authorcount{$author} += 1;
}

还必须注意:用regexen解析HTML充满了危险。考虑使用一个可以正确解析HTML的模块，例如Mojo::DOM。

正如之前的海报所指出的，隐藏在if ( $text =~ /.../gi )中的问题，它的计算结果为true，并且只执行一次block。

您正在寻找处理匹配在数组上下文中，可以通过for或while循环实现。

下面的代码片段演示了解决方案的许多方法之一。

use strict;
use warnings;
use feature 'say';
my(%authors, $fname, $text, $re);
$fname = 'authorcount.txt';
$text  = '<span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small>';
$re    = qr/<span>by <small class="author" itemprop="author">(.*?)</small>/;
$authors{$1}++ for $text =~ /$re/gi;
open my $fh, ">", $fname
or die "Can't open $fname";

say $fh "$_ $authors{$_}" for sort keys %authors;
close $fh;

注意:这段代码将适用于您的示例$text = '...'，如果您打算处理复杂的HTML文件，那么Mojo::DOM是解决问题的正确工具。

相关内容

最新更新

热门标签：