LXML Xpath Query



我正在编写一个小而脏的模块,将XML文档转换为JSON,以便各种Javascript库可以将其显示在表中。这涉及到我学习使用LXML及其各种XPath函数。

我有以下代码块:

def parse(self):
parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
print(f"test: { self.tree('/*') }")

在我的单元测试中,它输出以下内容:

test_parse (test_converter.TestConverter) ... test: [<Element {http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE at 0x7fa2b99a8dc0>]

然而,当我尝试如下查询时,我得到了一个空列表作为结果:

print(f"test: { self.tree('/VOTABLE*') }")

我已经尝试将命名空间准备为VOTABLE,如下所示,但也没有结果:

print(f"test: { self.tree('/{http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE*') }")

有人能告诉我我犯了什么新手错误吗?

样本数据:

<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs:         See LINK below
In case of problem, please report to:    cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
&lt;TD&gt;&lt;/TD&gt;, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
...
</RESOURCE>
...
</VOTABLE>

更新:解决方案

一旦drec4s指出我没有为查询注册命名空间,我就设法弄清楚我做错了什么。以下是代码的工作块:

parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
self.tree.register_namespace("n", "http://www.ivoa.net/xml/VOTable/v1.3")
test = self.tree("/n:VOTABLE/n:DESCRIPTION/text()")

您可以使用xpath方法,但也需要包含该方法的namespace映射:

from lxml import etree
from io import StringIO
xmldoc =  StringIO("""
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs:         See LINK below
In case of problem, please report to:    cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
&lt;TD&gt;&lt;/TD&gt;, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
</RESOURCE>
</VOTABLE>
""")
tree = etree.parse(xmldoc)
root = tree.getroot()
print(root.xpath('//n:DESCRIPTION', namespaces={'n': 'http://www.ivoa.net/xml/VOTable/v1.3'})[0].text)

输出:

VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs:         See LINK below
In case of problem, please report to:    cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3

最新更新