使用perl,我如何使用正则施加的字符串在其中带有随机html,带有一个带有锚的html链接,例如:
<a href="http://example.com" target="_blank">Whatever Example</a>
它只留下它并摆脱其他一切?不管&lt; a在href属性内是什么,例如 title=
或 style=
或其他。它留下了锚点:"任何例子"和&lt;/a>?
您可以利用流解析器,例如html :: tokeparser :: simple:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
my $html = <<EO_HTML;
Using Perl, how can I use a regex to take a string that has random HTML in it
with one HTML link with anchor, like this:
<a href="http://example.com" target="_blank">Whatever <i>Interesting</i> Example</a>
and it leave ONLY that and get rid of everything else? No matter what
was inside the href attribute with the <a, like title=, or style=, or
whatever. and it leave the anchor: "Whatever Example" and the </a>?
EO_HTML
my $parser = HTML::TokeParser::Simple->new(string => $html);
while (my $tag = $parser->get_tag('a')) {
print $tag->as_is, $parser->get_text('/a'), "</a>n";
}
输出:
$ ./what whate.pl&lt; a href =" http://example.com" target =" _ blank">任何有趣的例子&lt;/a>
如果您需要简单的正则解决方案,则可能是:
my @anchors = $text =~ m@(<a[^>]*?>.*?</a>)@gsi;
然而,正如 @dan1111所提到的那样,正式表达式不是出于各种原因解析HTML的正确工具。
如果您需要可靠的解决方案,请寻找HTML解析器模块。