如何在LibXML(Perl)中的xPath中使用正则表达式



我需要使用正则表达式按属性进行搜索。

在Python中,它看起来像这样:

from lxml import etree
dom = etree.parse(r'/path/to/file.XML')
regexpNS = "http://exslt.org/regular-expressions"
els = dom.xpath("//*[(re:test(@NAME, '.*Town.*', 'i')) and (@ISACTIVE='1' )]", namespaces={'re':regexpNS})
el = els[0]
print(el.attrib['NAME'] +" => " + el.attrib['OBJECTGUID'])

我不明白如何在perl 中做到这一点

my $dom = XML::LibXML->new->parse_file("/path/to/file.XML");
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('re', 'http://exslt.org/regular-expressions');
print $xpc->findnodes(q{//*[(re:test(@NAME, '.*Town.*', 'i')) and (@ISACTIVE='1' )]});

给出错误消息

错误:xmlXPathCompOpEval:未找到函数测试XPath错误:

第行的函数未注册。。。

我试图重写一个众所周知的例子:

自定义XPath函数

此示例通过定义一个基于Perl正则表达式的函数过滤节点来演示registerFunction((方法:

sub grep_nodes { 
my ($nodelist,$regexp) =  @_;
my $result = XML::LibXML::NodeList->new;
for my $node ($nodelist->get_nodelist()) {
$result->push($node) if $node->textContent =~ $regexp;
}
return $result;
};
my $xc = XML::LibXML::XPathContext->new($node);
$xc->registerFunction('grep_nodes', &grep_nodes);
my @nodes = $xc->findnodes('//section[grep_nodes(para,"bsearch(ing|es)?b")]');

回放如下:

use XML::LibXML;
my $dom = XML::LibXML->new->parse_string(<<'EOT');
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOT

sub grep_attrs {
my ($nodelist,$attr_name,$regexp) =  @_;
my $result = XML::LibXML::NodeList->new;
for my $node ($nodelist->get_nodelist()) {
my %attrs = map { $_->getName => $_->getValue } $node->attributes;
$result->push($node) if $attrs{$attr_name} =~ $regexp;
print $attrs{$attr_name}."n" if $attrs{$attr_name} =~ $regexp;
}
return $result;
};
print "n-========================================-n";
my $xc = XML::LibXML::XPathContext->new($dom);
$xc->registerFunction('grep_attrs', &grep_attrs);
my @nodes = $xc->findnodes(q{//*[grep_attrs(OBJECT,'NAME','.*Town.*')]});
print "n-========================================-n";
print @nodes;
print "n-========================================-n";

输出结果:

-========================================-
DownTown
MidthTown
UpTown
-========================================-
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1"/>
</ADDRESSOBJECTS>
-========================================-

函数有效,但是!

  1. 太长,比python中的长很多倍
  2. 出于某种原因,它返回完整的树,而不是找到的节点

帮助我理解这个问题,以及在按属性搜索时如何使用正则表达式??

改为使用XML::XPath,它通过matches()XPath 2.0函数支持正则表达式,这要归功于您真正的:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use XML::XPath v1.45;
my $xml = <<'EOXML';
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOXML
my $xp = XML::XPath->new(xml => $xml);
my @nodes = $xp->findnodes(q{//*[matches(@NAME, 'town', 'i') and @ISACTIVE = 1]});
for my $node (@nodes) {
say $node->getAttribute('NAME');
}

打印

DownTown
MidthTown
UpTown

使用XML::LibXML模块和注册函数的解决方案

#!/usr/bin/env perl -CSDA
use utf8;
use warnings;
use strict;
use feature qw/say/;
use XML::LibXML;
my $xml = <<'EOXML';
<?xml version="1.0" encoding="utf-8"?>
<ADDRESSOBJECTS>
<OBJECT ID="1" NAME="Broadway" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="2" NAME="Times Square" TYPENAME="sq" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="3" NAME="DownTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="4" NAME="MidthTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
<OBJECT ID="5" NAME="UpTown" TYPENAME="st" LEVEL="8" ISACTIVE="1" />
</ADDRESSOBJECTS>
EOXML
my $dom = XML::LibXML->new->parse_string($xml);
sub xpath_matches {
my ($input,$pattern,$flg) =  @_;
$flg = '' if !defined ($flg);
return 1 if $input =~ /(?$flg)$pattern/;
return undef;
}
my $xc = XML::LibXML::XPathContext->new($dom);
$xc->registerFunction('matches', &xpath_matches);
say $_->getAttribute('NAME').' '.$_->getAttribute('TYPENAME')
for $xc->findnodes(q{
//OBJECT[@NAME and matches(@NAME,'.*[tT]oWn$','i')]
});

PS:因为perl中没有布尔值,matches是一个返回某个值的函数,所以必须将其结果与某个值进行比较,在本例中为1,如果XPATH查询中只使用一个函数,请参见示例:

//OBJECT[matches(@NAME,'.*[tT]oWn$','i')=1]

如果XPATH查询中有其他操作,那么您可以使用它而不进行比较,如上面的示例中所示

//OBJECT[@NAME and matches(@NAME,'.*[tT]oWn$','i')]

//OBJECT[matches(@NAME,'.*[tT]oWn$','i') and 1]

在非常大的XML文件上,此解决方案几乎比XML::XPath快15倍。在数百兆字节的文件上,XML::XPath就死了。

所需要做的就是正确地编写和使用注册的函数。

因此,可以使用XML::LibXML中XPATH中的正则表达式!

最新更新