如何在 python 中删除与 tagName 和属性匹配的所有 XML 标记



我有一个大的扁平xsd文件,每个标签前面都有"xs:Something"。 我已经在我的平展 XML 中编译了一个未使用类型的列表,我想要一种自动的方式来删除开始、结束标记以及介于两者之间的所有内容。

示例 XSD:

<!--W3C XML Schema generated by XMLSpy v2019 rel. 3 sp1 (x64) (http://www.altova.com)-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.opentravel.org/OTA/2003/05" targetNamespace="http://www.opentravel.org/OTA/2003/05" elementFormDefault="qualified">
<xs:annotation>
<xs:documentation xml:lang="en">All Schema files in the OpenTravel Alliance specification are made available according to the terms defined by the OpenTravel License Agreement at http://www.opentravel.org/Specifications/Default.aspx.</xs:documentation>
</xs:annotation>
<xs:simpleType name="AvailabilityStatusType">
<xs:annotation>
<xs:documentation xml:lang="en">Identifies the availability status of an item.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:NMTOKENS">
<xs:enumeration value="Open">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Close">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="ClosedOnArrival">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="ClosedOnArrivalOnRequest">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may not be available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="OnRequest">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may be available.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="RemoveCloseOnly">
<xs:annotation>
<xs:documentation xml:lang="en">Remove Close restriction while keeping other restrictions in place.</xs:documentation>
</xs:annotation>
</xs:enumeration>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="RatePlanEnum">
<xs:annotation>
<xs:documentation xml:lang="en">Identifies rate plan types.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:NMTOKENS">
<xs:enumeration value="Government">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Negotiated">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Preferred">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Other_">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may not be available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
</xs:restriction>
</xs:simpleType>

假设我的未使用类型列表如下: myType = [RatePlanEnum]

这意味着我想删除整个simpleType name=RatePlanEnum节点。

我试过了:

from lxml import etree 
doc = etree.parse('myfile.xml') 
for elem in doc.findall('.//xs:simpleType'): 
parent = elem.getparent() 
if(elem.attrib.get('name') = 'RatePlanEnum'): 
parent.remove(elem)

如何以编程方式执行此操作并在所有修改后吐出 xml?

此文件使用命名空间xmlns:xs="http://www.w3.org/2001/XMLSchema",您必须在findall()中使用{http://www.w3.org/2001/XMLSchema}而不是xs:

doc.findall('.//{http://www.w3.org/2001/XMLSchema}simpleType'):

文档:lxml - 命名空间


完整示例:

from lxml import etree 
data = '''<!--W3C XML Schema generated by XMLSpy v2019 rel. 3 sp1 (x64) (http://www.altova.com)-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.opentravel.org/OTA/2003/05" targetNamespace="http://www.opentravel.org/OTA/2003/05" elementFormDefault="qualified">
<xs:annotation>
<xs:documentation xml:lang="en">All Schema files in the OpenTravel Alliance specification are made available according to the terms defined by the OpenTravel License Agreement at http://www.opentravel.org/Specifications/Default.aspx.</xs:documentation>
</xs:annotation>
<xs:simpleType name="AvailabilityStatusType">
<xs:annotation>
<xs:documentation xml:lang="en">Identifies the availability status of an item.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:NMTOKENS">
<xs:enumeration value="Open">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Close">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="ClosedOnArrival">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="ClosedOnArrivalOnRequest">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may not be available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="OnRequest">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may be available.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="RemoveCloseOnly">
<xs:annotation>
<xs:documentation xml:lang="en">Remove Close restriction while keeping other restrictions in place.</xs:documentation>
</xs:annotation>
</xs:enumeration>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="RatePlanEnum">
<xs:annotation>
<xs:documentation xml:lang="en">Identifies rate plan types.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:NMTOKENS">
<xs:enumeration value="Government">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Negotiated">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Preferred">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory is not available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
<xs:enumeration value="Other_">
<xs:annotation>
<xs:documentation xml:lang="en">Inventory may not be available for sale to arriving guests.</xs:documentation>
</xs:annotation>
</xs:enumeration>
</xs:restriction>
</xs:simpleType>
</xs:schema>'''
doc = etree.fromstring(data)
for elem in doc.findall('.//{http://www.w3.org/2001/XMLSchema}simpleType'):
parent = elem.getparent()
if elem.attrib.get('name') == 'RatePlanEnum':
parent.remove(elem)
print(etree.tostring(doc).decode())

使用lxml,您可以应用 XSLT-1.0 样式表,该样式表忽略了将参数传递给样式表的应删除元素。文件的其余部分由标识模板复制,不必要的空格由xsl:strip-space删除。

这里的参数由remov(=RatePlanEnum( 传递:

remov = ET.XSLT.strparam('RatePlanEnum')  

XSLT-1.0 样式表 (remove.xslt(:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:param name="toRemove" />
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*" />
</xsl:copy>
</xsl:template>
<xsl:template match="xs:simpleType">
<xsl:if test="@name != $toRemove">
<xsl:copy>
<xsl:apply-templates select="node()|@*" />
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

调用此样式表的 Python 代码可以是:

import lxml.etree as ET
# Parse XML and XSLT
dom = ET.parse(input_xml_filename)
xslt = ET.parse('remove.xslt')
transform = ET.XSLT(xslt)
# Set the parameter which specifies the element to be removed
remov = ET.XSLT.strparam('RatePlanEnum')
result = transform(dom, toRemove=remov)
# Print the result
print(ET.tostring(result, pretty_print=True))

最新更新