我有一个如下所示的XML文件,
<?xml version="1.0"?>
<data>
<header>
<name>V9 Red Indices</name>
<version>9</version>
<date>2017-03-16</date>
</header>
<index>
<indexfamily>ITRAXX-Asian</indexfamily>
<indexsubfamily>iTraxx Rest of Asia</indexsubfamily>
<paymentfrequency>3M</paymentfrequency>
<recoveryrate>0.35</recoveryrate>
<constituents>
<constituent>
<refentity>
<originalconstituent>
<referenceentity>ICICI Bank Limited</referenceentity>
<redentitycode>Y1BDCC</redentitycode>
<role>Issuer</role>
<redpaircode>Y1BDCCAA9</redpaircode>
<jurisdiction>India</jurisdiction>
<tier>SNRFOR</tier>
<pairiscurrent>false</pairiscurrent>
<pairvalidfrom>2002-03-30</pairvalidfrom>
<pairvalidto>2008-10-22</pairvalidto>
<ticker>ICICIB</ticker>
<ispreferred>false</ispreferred>
<docclause>CR</docclause>
<recorddate>2014-02-25</recorddate>
<weight>0.0769</weight>
</originalconstituent>
</refentity>
<refobligation>
<type>Bond</type>
<isconvert>false</isconvert>
<isperp>false</isperp>
<coupontype>Fixed</coupontype>
<ccy>USD</ccy>
<maturity>2008-10-22</maturity>
<coupon>0.0475</coupon>
<isin>XS0178885876</isin>
<cusip>Y38575AQ2</cusip>
<event>Matured</event>
<obligationname>ICICIB 4.75 22Oct08</obligationname>
<prospectusinfo>
<issuers>
<origissuersasperprosp>ICICI Bank Limited</origissuersasperprosp>
</issuers>
</prospectusinfo>
</refobligation>
</constituent>
</constituents>
</index>
</data>
我想在不知道标签名称的情况下遍历此文件。我的最终目标是创建一个带有标签名称和值的哈希。
我不想为每个节点使用带有 XPath 的findnodes
。这违背了编写通用加载器的全部目的。
我也在使用XML-LibXML-2.0126,一个稍旧的版本。
下面有一部分使用findnodes
的代码。XML也被缩短,以避免冗长的查询,现在它已成为:)
use XML::LibXML;
my $xmldoc = $parser->parse_file( $fileName );
my $root = $xmldoc->getDocumentElement() || die( "Could not get Document Element n" );
foreach my $index ( $root->findnodes( "index" ) ) { # $root->getChildNodes()) # Get all the Indexes
foreach my $constituent ( $index->findnodes( 'constituents/constituent' ) ) { # Lets pick up all Constituents
my $referenceentity = $constituent->findnodes( 'refentity/originalconstituent/referenceentity' ); # This is a crude way. we should be iterating without knowing whats inside
print "referenceentity :" . $referenceentity . "n";
print "+++++++++++++++++++++++++++++++++++ n";
}
}
使用 XML::LibXML::Node
提供的nonBlankChildNodes
、nodeName
和textContent
方法:
my %hash;
for my $node ( $oc->nonBlankChildNodes ) {
my $tag = $node->nodeName;
my $value = $node->textContent;
$hash{$tag} = $value;
}
相当于:
my %hash = map { $_->nodeName, $_->textContent } $oc->nonBlankChildNodes;
你确定要这个吗?从解析的XML::LibXML::Document
对象访问任意数据就像从嵌套的Perl哈希访问任意数据一样简单。如果这是您的意图,它肯定会比等效对象占用更少的内存空间,但从您的问题来看,它似乎并非如此
您可以使用 XML::Parser
模块轻松执行此操作,该模块在每次 XML 数据中发生"事件"时都会调用回调。在这种情况下,我们感兴趣的只是一个开放标签、一个结束标签和一个文本字符串
此示例代码从 XML 生成嵌套哈希。如果 XML 数据格式不正确(结束标记与开始标记的名称不匹配(,或者如果任何元素具有一个或多个属性(无法在此结构中表示(,则它与适当的消息一起死亡
我使用Data::Dump
来显示结果
use strict;
use warnings 'all';
use XML::Parser;
use Data::Dump;
my $parser = XML::Parser->new(
Style => 'Debug',
Handlers => {
Start => &handle_start,
End => &handle_end,
Char => &handle_char,
},
);
my %data;
my @data_stack = ( %data );
my @elem_stack;
$parser->parsefile( 'index.xml' );
dd %data;
sub handle_start {
my ($expat, $elem) = @_;
my $data = $data_stack[-1]{$elem} = { };
push @data_stack, $data;
push @elem_stack, $elem;
if ( @_ > 2 ) {
my $xpath = join '', map "/$_", @elem_stack;
die qq{Element at $xpath has attributes};
}
}
sub handle_end {
my ($expat, $elem) = @_;
my $top_elem = pop @elem_stack;
die qq{Bad XML structure $elem <=> $top_elem} unless $elem eq $top_elem;
pop @data_stack;
}
sub handle_char {
my ($expat, $str) = @_;
return unless $str =~ /S/;
my $top_elem = $elem_stack[-1];
$data_stack[-2]{$top_elem} = $str;
}
输出
{
data => {
header => {
date => "2017-03-16",
name => "V9 Red Indices",
version => 9,
},
index => {
constituents => {
constituent => {
refentity => {
originalconstituent => {
docclause => "CR",
ispreferred => "false",
jurisdiction => "India",
pairiscurrent => "false",
pairvalidfrom => "2002-03-30",
pairvalidto => "2008-10-22",
recorddate => "2014-02-25",
redentitycode => "Y1BDCC",
redpaircode => "Y1BDCCAA9",
referenceentity => "ICICI Bank Limited",
role => "Issuer",
ticker => "ICICIB",
tier => "SNRFOR",
weight => 0.0769,
},
},
refobligation => {
ccy => "USD",
coupon => 0.0475,
coupontype => "Fixed",
cusip => "Y38575AQ2",
event => "Matured",
isconvert => "false",
isin => "XS0178885876",
isperp => "false",
maturity => "2008-10-22",
obligationname => "ICICIB 4.75 22Oct08",
prospectusinfo => {
issuers => {
origissuersasperprosp => "ICICI Bank Limited"
},
},
type => "Bond",
},
},
},
indexfamily => "ITRAXX-Asian",
indexsubfamily => "iTraxx Rest of Asia",
paymentfrequency => "3M",
recoveryrate => 0.35,
},
},
}