我使用 lxml 是验证 XML 文件是否存在语法错误的正确方法吗?



我正在尝试使用以下 pom.xml 创建一个 Python 脚本并使用 lxml 验证 pom 是否存在任何语法错误,以进一步确认<version>SNAPSHOT并更新<version>以匹配此格式ci_{git hub org}_{branch name}-SNAPSHOT

project xmlns="http://maven.apache.org/POM/4.0.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.wsi.devops</groupId>
  <artifactId>python-test</artifactId>
  <version>1.0-SNAPSHOT</version>
</project>

这是我目前使用解决方案的地方,

# For XML validation, importing the etree module from the lxml 
# package, as well as sys for handling input.
from lxml import etree
import sys
#filename as command line arguments
filename_xml = sys.argv[1]
# parse xml
try:
    doc = etree.parse(sys.argv[1])
    print('XML well formed, syntax ok.')
# check for XML syntax errors
except etree.XMLSyntaxError as err:
    print('XML Syntax Error, see error_syntax.log')
    with open('error_syntax.log', 'w') as error_log_file:
        error_log_file.write(str(err.error_log))
    quit()
except:
    print('Unknown error, exiting.')
    quit()
#Update version
from xml.etree import ElementTree as et
tree = et.parse(sys.argv[1])
tree.find('1.0').text = 'ci_{git hub org name}_{branch name}'
tree.write(sys.argv[1])

只是想为我在脚本中犯的任何错误获得一些帮助。

代码的主要问题是不正确地使用了 ElementTreeparse() 方法。它采用标签名或某种简化的 xpath 语法,而您似乎将其视为采用任意字符串的 str.find() 方法。您需要的是版本标记。

您的解析代码应如下所示:

version = tree.find('ns:version', {ns:'http://maven.apache.org/POM/4.0.0'})
if 'SNAPSHOT' in version.text:
    version.text = 'ci_{git hub org name here}_{branch name here}'
    # I guess you have some other code that sets this version properly
else:
    print("Not a snapshot.")

请注意,始终必须设置命名空间才能查找版本。下面我要谈第二点。为什么要解析文件两次?lxml 只是 xml 的一个功能更丰富的版本。您只需要导入一个!lxml还有一个优点,它的ElementTrees具有nsmap属性,因此您不必自己键入命名空间。我想这会让它更健壮,如果 Apache 发布一个新的 Maven 版本或其他东西:

tree = etree.parse(sys.argv[1])
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})

对于完整的代码,仅使用 lxml

from lxml import etree
import sys
# parse xml
try:
    tree = etree.parse(sys.argv[1])
    print('XML well formed, syntax ok.')
except OSError: # check for file errors (e.g missing)
    print("Bad file: " + sys.argv[1])
    quit()
# check for XML syntax errors
except etree.XMLSyntaxError as err:
    print('XML Syntax Error, see error_syntax.log')
    with open('error_syntax.log', 'w') as error_log_file:
        error_log_file.write(str(err.error_log))
    quit()
except:
    print('Unknown error, exiting.')
    quit()
#Update version
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})
if 'SNAPSHOT' not in version.text:
    print("Not a snapshot")
    quit() # Quitting after a failure is a way to avoid nesting
version.text = 'ci_{git hub org name}_{branch name}'
# I guess you have some other code that sets this version properly
tree.write(sys.argv[1])

最新更新