用etree搜索整棵树

我使用xml.etree.ElementTree作为ET，这似乎是一个入门库，但如果还有其他更好的东西，我很感兴趣。

假设我有一棵树：

doc = """
<top>
<second>
<third>
    <subthird></subthird>
    <subthird2>
         <subsubthird>findme</subsubthird>
    </subthird2>
</third>
</second>
</top>"""

为了解决这个问题，假设这已经存在于一个名为myTree的elementree中

我想将findme更新为found，除了像这样迭代之外，还有其他简单的方法吗

myTree.getroot().getchildren()[0].getchildren()[0].getchildren() 
    [1].getchildren()[0].text = 'found'

问题是我有一个很大的xml树，我想更新这些值，但我找不到一个清晰而Python的方法。

您可以使用XPath表达式来获得如下特定标记名：

for el in myTree.getroot().findall(".//subsubthird"):
    el.text = 'found'

如果您需要查找具有特定文本值的所有标记，请查看以下答案：在ElementTree中使用XPath逐文本查找元素。

我将lxml与XPath表达式一起使用。ElementTree有一个缩写的XPath语法，但由于我不使用它，我不知道它有多广泛。XPath的特点是，你可以根据需要编写复杂的元素选择器。在这个例子中，它基于嵌套：

import lxml.etree 
doc = """
<top>
<second>
<third>
    <subthird></subthird>
    <subthird2>
         <subsubthird>findme</subsubthird>
    </subthird2>
</third>
</second>
</top>"""
root = lxml.etree.XML(doc)
for elem in root.xpath('second/third/subthird2/subsubthird'):
    elem.text = 'found'
print(lxml.etree.tostring(root, pretty_print=True, encoding='unicode'))

但假设有其他东西在识别，比如一个独特的属性，

<subthird2 class="foo"><subsubthird>findme</subsubthird></subthird2>

那么您的xpath将是//subthird2[@class="foo"]/subsubthird。

相关内容

最新更新

热门标签：