识别python中损坏的.xml



我有一个小的python脚本,它正在读取几个. xml文件。现在我必须断言这些. xml文件没有以任何方式损坏。我怎么检查这个?我的做法是:

xml_tree = ET.parse(path) //path = path to .xml
xml_file = xml_tree.getroot()

如果XML文件损坏,ET.parse()将引发ParseError异常:

>>> print open('test.xml').read()
This is not an XML file
>>> from xml.etree import ElementTree as ET
>>> ET.parse('test.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
    parser.feed(data)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

简单地捕获异常:

try:
    ET.parse(path)
except ET.ParseError:
    print('{} is corrupt'.format(path))

最新更新