PerlXML::LibXML从特定节点获取信息



我意识到有很多类似的问题,但我仍然无法找到我想要的具体答案。

我使用Perl和XML::LibXML库从XML文件中读取信息。XML文件有许多节点和许多子节点(以及子节点等(。我正试图从XML文件"每个节点"中提取信息,但我真的陷入了困境,试图弄清楚如何做到这一点。

这里只是我尝试做的一个例子:

#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);
foreach $chapter ($dom->findnodes('/file/chapter')) {
my $chapterNumber = $chapter->findvalue('@number');
print "Chapter #$chapterNumbern";
#I tried $dom->findnodes('/file/chapter/section') <-- spelling out the xPath with same results..
foreach $section ($dom->findnodes('//section')) {
my $sectionNumber = $section->findvalue('@number');
print " Section #$sectionNumbern";
foreach $subsection ($dom->findnodes('//subsection')) {
my $subsectionNumber = $subsection->findvalue('@number');
print "  SubSection $subsectionNumbern";
}
}
}

这个特定的XML文件是这样设置的:

<file>
<chapter number="1">
<section number="abc123">
There is some data here I'd like to get to
<subsection number="abc123.(s)(4)">
Some additional data here
<subsection number="deeperSubSec">
There might even be deeper subsections
</subsection>
</subsection>
</section>
</chapter>
<chapter number="208">
<section number="dgfj23">
There is some data here I'd like to get to also
<subsection number="dgfj23.(s)(4)">
Some additional data here also
<subsection number="deeperSubSec44">
There might even be deeper subsections also
</subsection>
</subsection>
</section>
</chapter>
<chapter number="998">
<section number="xxxid">
There is even more data here I'd like to get to also
<subsection number="xxxid.(s)(4)">
Some additional data also here too
<subsection number="deeperSubSec999">
There might even be deeper subsections also again
</subsection>
</subsection>
</section>
</chapter>
</file>

不幸的是,我最终得到的只是一个重复数据的列表。我确信这是因为我的嵌套for循环,但我真的没有掌握如何对这种数据类型进行操作的基本理解。希望有人能提供一些资源或见解。

这是我当前的输出:

Chapter #1
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Chapter #208
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Chapter #998
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999

因此,对于每一章,我都会阅读所有章节,然后阅读所有小节,等等。一遍又一遍。。

我想做的是阅读每一章、相关章节,然后阅读每一章节、相关小节和其中任何适用的小节。。

像这样:

Chapter #1
Section #abc123
Subsection #abc123.(s)(4
Sub-Subsection #deeperSubSec
Chapter #208
Section #dgfj23
Subsection #dgfj23.(s)(4)
Sub-Subsection #deeperSubSec44
etc...

此外,最终,在我弄清楚基本操作的工作原理后,我需要访问每个章节、小节、小节等中包含的数据。但我认为我需要在跑步前先走一走,所以我会先尝试获取属性的简单值。。

谢谢你的帮助。

所以我想我想通了。我一直在操作$dom对象,该对象包含整个XML树。我相信我需要做的是对我正在观察的那棵树进行操作,就像这样:

#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);

for $chapter ($dom->findnodes('/file/chapter')) {
print "Chapter #" . $chapter->findvalue('@number') ."n";
foreach $section ($chapter->findnodes('section')) {
print " Section #" .$section->findvalue('@number') . "n";
foreach $subsection ($section->findnodes('subsection')) {
print "  Subsection #" . $subsection->findvalue('@number') . "n";
}
}
}

这使得输出更像我所希望的:

Chapter #1
Section #abc123
Subsection #abc123.(s)(4)
Chapter #208
Section #dgfj23
Subsection #dgfj23.(s)(4)
Chapter #998
Section #xxxid
Subsection #xxxid.(s)(4)

这里有一个更简洁的例子,它有助于说明我现在正在处理从我当前所在的上一个循环中获得的树的特定部分:

#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);

my @chapters = $dom->findnodes('/file/chapter');
for $chapter (@chapters) {
my $chapterNo = $chapter->findvalue('@number');
print "Chpater #$chapterNon";
@sections = $chapter->findnodes('section');
for $section (@sections) {
my $sectionNo = $section->findvalue('@number');
print " Section #$sectionNon";
@subsections = $section->findnodes('subsection');
for $subsection (@subsections) {
my $subsectionNo = $subsection->findvalue('@number');
print "  Subsection #$subsectionNon";
}
}
}

相关内容

  • 没有找到相关文章

最新更新