如何跳过验证lxml中的全局声明问题

如何跳过Element 'baz': No matching global declaration available for the validation root., line 1这个错误？

我需要验证一组通用的XML/XSD对，它们不一定以任何方式组成相似，因此应用于特定XML结构的硬编码/文字规则不适用。

XSD是由GMC Inspire Designer生成的，它通常不是XML验证器，并且非常"；松散的"；在如何检查其语法方面。全局声明问题发生在我的本地验证器中，但由于其宽松性，Inspire Designer中没有发生。

如何针对lxml将生成的特定错误集进行指定，并继续验证？

使用以下代码：

#get a list of all files in the working directory that are .xml files
xml_files_from_cwd = [xml_f for xml_f in listdir(my_path) if isfile(join(my_path, xml_f)) 
and xml_f.lower().endswith(".xml")]
xml_validator = etree.XMLSchema(file= my_path)
for xml in xml_files_from_cwd:
recovering_parser = etree.XMLParser(recover=True)
xml_file = etree.parse(my_path + "/" +xml, parser=recovering_parser)
successful = False 
try:
successful = xml_validator.assertValid(xml_file)
except Exception as e:
print(f"File not valid: {e}")

if successful:
print(f"No errors detected in {xml}.")

我在验证XML文件的XML外观时遇到问题，通常如下所示：

<baz>
<bar BEGIN="1">
... [repeating elements here]
</bar>
</baz>

以及遵循以下格式的XSD：

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="foo">
<xsd:complexType>
<xsd:sequence minOccurs="1" maxOccurs="1">
<xsd:element name="bar" minOccurs="1" maxOccurs="unbounded">
.... [repeating elements here]
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

这里的问题是验证依赖于整个文档的有效性。

例如，如果您的文档有效期为：

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="foo">
<xs:complexType>
<xs:choice>
<xs:element name="bar">
<xs:complexType>
<xs:choice>
<xs:element name="baz"/>
<xs:element name="qux"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="quux">
<xs:complexType>
<xs:sequence>
<xs:element name="qux"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>    
</xs:complexType>
</xs:element>
</xs:schema>

这份文件会有问题：

<foo>
<quuz>
<qux/>
...
</quuz>
</foo>

quuz应该是bar还是quux？

你可能可以从下面的内容中判断出来，但每次遇到问题时，你都必须回溯到每个决定，然后再尝试另一个决定。

这很快就会变得非常复杂，因为有效的东西可能取决于它的内容、结构、属性值等。很快，你就会有太多的选择要测试，以至于它变得不可能——你甚至可以想到选择的数量实际上是无限的情况，所以你必须包含非常复杂的逻辑才能得出有效的值。

在简单的情况下，比如您展示的只有外部标记可能被误名的示例，您可以简单地修复内存中的错误并重试验证。但这并不是一种可以扩展到整个文档的方法。

注意：在现实生活中，你可能真的知道并期待会发生什么，你可以遵循一种尝试验证的策略，如果验证失败，重复解决问题，因为你确实知道选项是什么，直到你到达文档的末尾。我的回答只是想表明，这里没有通用的解决方案。

这个问题的答案似乎是；我们是否可以继续验证一个文件超过初始故障条件"；，没有，因为无法保证任何进一步的验证是否会在简单/琐碎的情况之外产生积极的结果。

相关内容

最新更新

热门标签：