在 Perl 中使用 libxml 遍历元素



我有一个如下所示的XML文件,

<?xml version="1.0"?>
<data>
  <header>
    <name>V9 Red Indices</name>
    <version>9</version>
    <date>2017-03-16</date>
  </header>
  <index>
    <indexfamily>ITRAXX-Asian</indexfamily>
    <indexsubfamily>iTraxx Rest of Asia</indexsubfamily>                
    <paymentfrequency>3M</paymentfrequency>
    <recoveryrate>0.35</recoveryrate>
    <constituents>
      <constituent>
        <refentity>
          <originalconstituent>
            <referenceentity>ICICI Bank Limited</referenceentity>
            <redentitycode>Y1BDCC</redentitycode>
            <role>Issuer</role>
            <redpaircode>Y1BDCCAA9</redpaircode>
            <jurisdiction>India</jurisdiction>
            <tier>SNRFOR</tier>
            <pairiscurrent>false</pairiscurrent>
            <pairvalidfrom>2002-03-30</pairvalidfrom>
            <pairvalidto>2008-10-22</pairvalidto>
            <ticker>ICICIB</ticker>
            <ispreferred>false</ispreferred>
            <docclause>CR</docclause>
            <recorddate>2014-02-25</recorddate>
            <weight>0.0769</weight>
          </originalconstituent>
        </refentity>
        <refobligation>
          <type>Bond</type>
          <isconvert>false</isconvert>
          <isperp>false</isperp>
          <coupontype>Fixed</coupontype>
          <ccy>USD</ccy>
          <maturity>2008-10-22</maturity>
          <coupon>0.0475</coupon>
          <isin>XS0178885876</isin>
          <cusip>Y38575AQ2</cusip>
          <event>Matured</event>
          <obligationname>ICICIB 4.75 22Oct08</obligationname>
          <prospectusinfo>
            <issuers>                                                        
              <origissuersasperprosp>ICICI Bank Limited</origissuersasperprosp>
            </issuers>
          </prospectusinfo>
        </refobligation>
      </constituent>
    </constituents>
  </index>
</data>

我想在不知道标签名称的情况下遍历此文件。我的最终目标是创建一个带有标签名称和值的哈希。

我不想为每个节点使用带有 XPath 的findnodes。这违背了编写通用加载器的全部目的。

我也在使用XML-LibXML-2.0126,一个稍旧的版本。

下面有一部分使用findnodes的代码。XML也被缩短,以避免冗长的查询,现在它已成为:)

use XML::LibXML;
my $xmldoc = $parser->parse_file( $fileName );
my $root = $xmldoc->getDocumentElement() || die( "Could not get Document Element n" );
foreach my $index ( $root->findnodes( "index" ) ) {    # $root->getChildNodes()) # Get all the Indexes
    foreach my $constituent ( $index->findnodes( 'constituents/constituent' ) ) { # Lets pick up all Constituents
        my $referenceentity = $constituent->findnodes( 'refentity/originalconstituent/referenceentity' );    # This is a crude way. we should be iterating without knowing whats inside
        print "referenceentity :" . $referenceentity . "n";
        print "+++++++++++++++++++++++++++++++++++ n";
    }
}

使用 XML::LibXML::Node 提供的nonBlankChildNodesnodeNametextContent方法:

my %hash;
for my $node ( $oc->nonBlankChildNodes ) {
    my $tag = $node->nodeName;
    my $value = $node->textContent;
    $hash{$tag} = $value;
}

相当于:

my %hash = map { $_->nodeName, $_->textContent } $oc->nonBlankChildNodes;

你确定要这个吗?从解析的XML::LibXML::Document对象访问任意数据就像从嵌套的Perl哈希访问任意数据一样简单。如果这是您的意图,它肯定会比等效对象占用更少的内存空间,但从您的问题来看,它似乎并非如此

您可以使用 XML::Parser 模块轻松执行此操作,该模块在每次 XML 数据中发生"事件"时都会调用回调。在这种情况下,我们感兴趣的只是一个开放标签、一个结束标签和一个文本字符串

此示例代码从 XML 生成嵌套哈希。如果 XML 数据格式不正确(结束标记与开始标记的名称不匹配(,或者如果任何元素具有一个或多个属性(无法在此结构中表示(,则它与适当的消息一起死亡

我使用Data::Dump来显示结果

use strict;
use warnings 'all';
use XML::Parser;
use Data::Dump;
my $parser = XML::Parser->new(
    Style    => 'Debug',
    Handlers => {
        Start => &handle_start,
        End   => &handle_end,
        Char  => &handle_char,
    },
);

my %data;
my @data_stack = ( %data );
my @elem_stack;
$parser->parsefile( 'index.xml' );
dd %data;

sub handle_start {
    my ($expat, $elem) = @_;
    my $data = $data_stack[-1]{$elem} = { };
    push @data_stack, $data;
    push @elem_stack, $elem;
    if ( @_ > 2 ) {
        my $xpath = join '', map "/$_", @elem_stack;
        die qq{Element at $xpath has attributes};
    }
}

sub handle_end {
    my ($expat, $elem) = @_;
    my $top_elem = pop @elem_stack;
    die qq{Bad XML structure $elem <=> $top_elem} unless $elem eq $top_elem;
    pop @data_stack;
}

sub handle_char {
    my ($expat, $str) = @_;
    return unless $str =~ /S/;
    my $top_elem = $elem_stack[-1];
    $data_stack[-2]{$top_elem} = $str;
}

输出

{
    data => {
        header => {
            date => "2017-03-16",
            name => "V9 Red Indices",
            version => 9,
        },
        index  => {
            constituents => {
                constituent => {
                    refentity => {
                        originalconstituent => {
                            docclause       => "CR",
                            ispreferred     => "false",
                            jurisdiction    => "India",
                            pairiscurrent   => "false",
                            pairvalidfrom   => "2002-03-30",
                            pairvalidto     => "2008-10-22",
                            recorddate      => "2014-02-25",
                            redentitycode   => "Y1BDCC",
                            redpaircode     => "Y1BDCCAA9",
                            referenceentity => "ICICI Bank Limited",
                            role            => "Issuer",
                            ticker          => "ICICIB",
                            tier            => "SNRFOR",
                            weight          => 0.0769,
                        },
                    },
                    refobligation => {
                        ccy            => "USD",
                        coupon         => 0.0475,
                        coupontype     => "Fixed",
                        cusip          => "Y38575AQ2",
                        event          => "Matured",
                        isconvert      => "false",
                        isin           => "XS0178885876",
                        isperp         => "false",
                        maturity       => "2008-10-22",
                        obligationname => "ICICIB 4.75 22Oct08",
                        prospectusinfo => {
                            issuers => {
                                origissuersasperprosp => "ICICI Bank Limited"
                            },
                        },
                        type => "Bond",
                    },
                },
            },
            indexfamily      => "ITRAXX-Asian",
            indexsubfamily   => "iTraxx Rest of Asia",
            paymentfrequency => "3M",
            recoveryrate     => 0.35,
        },
    },
}

相关内容

  • 没有找到相关文章

最新更新