作为我最后一个问题的后续问题(Perl XML::LibXML 从特定节点获取信息)
给定以下 XML 数据,我无法弄清楚如何获取<tab/>
标记之后显示的数据(如果没有从该部分中的子节点获取所有数据,它没有结束标记?有关更多详细信息,请参见下文:
XML 示例:
<title number="3">
<catchline>Uniform Agricultural Cooperative Association Act</catchline>
<chapter number="3-1">
<catchline>
General Provisions Relating to Agricultural Cooperative Associations
</catchline>
<section number="3-1-1">
<histories>
<history>
Amended by Chapter
<modchap sess="2010GS">378</modchap>
, 2010 General Session
</history>
<modyear>2010</modyear>
</histories>
<catchline>Declaration of policy.</catchline>
<tab/>
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed. THIS IS THE DATA THAT I WANT TO GET
</section>
<section number="3-1-1.1">
<histories>
<history>
Amended by Chapter
<modchap sess="1996GS">79</modchap>
, 1996 General Session
</history>
<modyear>1996</modyear>
</histories>
<catchline>General corporation laws do not apply.</catchline>
<tab/>
<xref depth="1" refnumber="16-10a" start="0">
Title 16, Chapter 10a, Utah Revised Business Corporation Act
</xref>
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
<xref depth="3" refnumber="3-1-13.4" start="0">3-1-13.4</xref>
,
<xref depth="3" refnumber="3-1-13.7" start="0">3-1-13.7</xref>
, and
<xref depth="3" refnumber="3-1-16.1" start="0">3-1-16.1</xref>
.
</section>
</chapter>
</title>
这是我当前的Perl脚本:
!/usr/bin/perl -w
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => "file.xml");
my $titleName = $dom->findvalue('/title/catchline');
print "Title $titleNamen";
my @chapters = $dom->findnodes('/title/chapter');
for $chapter (@chapters) {
my $chapterNo = $chapter->getAttribute('number');
my $chapterName = $chapter->findvalue('catchline');
print " Chapter #$chapterNo - $chapterNamen";
my @sections = $chapter->findnodes('section');
for $section (@sections) {
my $sectionNo = $section->getAttribute('number');
my $sectionName = $section->findvalue('catchline');
my $sectionData = $section->textContent;
print " Section #$sectionNo - $sectionNamenSECDATA: $sectionDatann";
}
}
这有效,但发生的事情可能正是我所要求的,它会打印$sectionData变量<section>
中的所有内容。
我试图做的只是从<tab/>
标签之后获取数据,而标签中没有其他任何东西。比如<histories><history><xref>
等的子标签。
例如,字符串:
,不适用于受此管辖的国内或外国公司 章节,但各节中特别规定的除外
不包含在任何特定标签中,如何获取该数据?
当前输出为:
Title Uniform Agricultural Cooperative Association Act
Chapter #3-1 -
General Provisions Relating to Agricultural Cooperative Associations
Section #3-1-1 - Declaration of policy.
SECDATA:
Amended by Chapter
378
, 2010 General Session
2010
Declaration of policy.
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.
Section #3-1-1.1 - General corporation laws do not apply.
SECDATA:
Amended by Chapter
79
, 1996 General Session
1996
General corporation laws do not apply.
Title 16, Chapter 10a, Utah Revised Business Corporation Act
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
3-1-13.4
,
3-1-13.7
, and
3-1-16.1
.
但我正在寻找的更像是:
Title Uniform Agricultural Cooperative Association Act
Chapter #3-1 -
General Provisions Relating to Agricultural Cooperative Associations
Section #3-1-1 - Declaration of policy.
SECDATA:
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.
Section #3-1-1.1 - General corporation laws do not apply.
SECDATA:
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
如果您希望tab
元素后面的所有节点(即元素和文本节点),则可以使用以下命令:
my @post_tab_nodes = $section_node->findnodes('tab/following-sibling::node()');
将生成的节点呈现为文本是留给用户的练习。您可以使用$node->nodeType
将元素节点与文本节点区分开来。它分别返回这些节点类型的XML_ELEMENT_NODE
和XML_TEXT_NODE
(由 XML::LibXML 导出)。