获取此祖父级的XPath表达式

我是XPath的新手，我正在处理一个XML文件，看起来像这样:

<doc>
<component>
<author> Bob </author>
</component>

<component>
<sB>
<component>
<section ID='S1'>
<title>Some s1 title</title>
</section>
</component>
<component>
<section ID='S2'>
<title>Some s2 title</title>
</section>
</component>
</sB>
</component>
</doc>

我想检索上面的组件项目与节ID = S1，或者有一个标题元素的文本'Some S1 title'。我不能指望这些东西是按特定的顺序排列的。

到目前为止我已经试过了

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
res = tree.getroot().findall(".//*[title='Some s1 title']../../")
for i in res:
ET.dump(i)

，但这得到了两个组件，而不仅仅是具有匹配标题的组件。

我也试着在节ID级别搜索，像这样:

res = tree.getroot().findall(".//*section[@ID='S1']/../")
for i in res:
ET.dump(i)

但是这并没有给我父组件(整个组件)，而只是给了我部分。

这两个似乎可以从我在网上看到的简单示例语法中工作，但显然在这两种情况下，我都对实际发生的事情缺少一些理解。有人可以澄清这里发生了什么，为什么我没有得到我所期望的?

两个xpath都有语法错误:

.//*[title='Some s1 title']../../在谓词后缺少一个/。那么这一个无论如何都是超向上的。
.//*section[@ID='S1']/../不能先于section出现*。

但是，与其从那里修复和工作，您实际上不需要沿着父轴或祖先轴进行选择-无论如何，最好使用层次结构中更高的谓词…

这个XPath

//component[section/@ID='S1']

选择section子元素且id属性值等于'S1'的component元素。

这个XPath

//component[section/title='Some s1 title']

选择section子元素和title子元素且字符串值等于'Some s1 title'的component元素。

关于Python XPath库quarks的说明:

ElementTree:不听从医生的指导。避免。
lxml:使用xpath()而不是findall().

参见

XPath SyntaxError: invalid predicate

编写XPath表达式以选择component，然后使用谓词(方括号内的条件)确定需要哪个components。如:

component包含section与ID= 'S1'

//component[./section[@ID='S1']]

orcomponent包含section/title= 'Some s1 title'

//component[./section/title/text() = 'Some s1 title']

或包含sectionID = 'S1'且section具有title= 'Some S1 title'的组件

//component[./section[@ID='S1']/title/text() = 'Some s1 title']

及其其他变化是可能的。

关于Python XPath库quarks的说明:

相关内容

最新更新

热门标签：