Python lxml:如何使用xpath选择器获取XML标签名称?



我正在尝试使用Python和lxml解析以下XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/bind9.xsl"?>
<isc version="1.0">
<bind>
<statistics version="2.2">
<memory>
<summary>
<TotalUse>1232952256
</TotalUse>
<InUse>835252452
</InUse>
<BlockSize>598212608
</BlockSize>
<ContextSize>52670016
</ContextSize>
<Lost>0
</Lost>
</summary>
</memory>
</statistics>
</bind>
</isc>

目标是提取bind/statistics/memory/summary下每个元素的标记名称和文本,以生成以下映射:

TotalUse: 1232952256
InUse: 835252452
BlockSize: 598212608
ContextSize: 52670016
Lost: 0

我已经设法提取了元素值,但我无法弄清楚 xpath 表达式来获取元素标签名称。

示例脚本:

from lxml import etree as et
def main():
xmlfile = "bind982.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "??????" ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)[0]
value = stat.xpath(value_selector)[0]
print "{0}: {1}".format(label, value)
if __name__ == '__main__':
main()

我知道我可以使用value = stat.tag而不是stat.xpath(),但是脚本必须足够通用,以便还可以处理标签选择器不同的其他XML片段。

哪个 xpath 选择器会返回元素的标签名称?

只需使用 XPath 的name()并删除零索引,因为这会返回一个字符串而不是列表。

from lxml import etree as et
def main():
xmlfile = "ExtractXPathTagName.xml"
location = "bind/statistics/memory/summary/*"
label_selector = "name()"                         ## what to put here...?
value_selector = "text()"
with open(xmlfile, "r") as data:
xmldata = et.parse(data)
etree = xmldata.getroot()
statlist = etree.xpath(location)
for stat in statlist:
label = stat.xpath(label_selector)
value = stat.xpath(value_selector)[0]
print("{0}: {1}".format(label, value).strip())
if __name__ == '__main__':
main()

输出

TotalUse: 1232952256    
InUse: 835252452    
BlockSize: 598212608    
ContextSize: 52670016    
Lost: 0

我认为这两个值不需要XPath,元素节点具有属性tagtext因此例如使用列表推导:

[(element.tag, element.text) for element in etree.xpath(location)]

或者如果你真的想使用 XPath

result = [(element.xpath('name()'), element.xpath('string()')) for element in etree.xpath(location)]

当然,您也可以构建字典列表:

result = [{ element.tag : element.text } for element in root.xpath(location)]

result = [{ element.xpath('name()') : element.xpath('string()') } for element in etree.xpath(location)]

最新更新