我试图从Xml中删除标签。remove
的Alto文件。我的Alto文件是这样的:
<alto xmlns="http://www.loc.gov/standards/alto/ns-v4#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd"> <Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>filename</fileName>
</sourceImageInformation>
</Description>
<Layout>
<Page>
<PrintSpace>
<TextBlock>
<Shape><Polygon/></Shape>
<TextLine>
<Shape><Polygon/></Shape>
<String CONTENT="ABCDEF" HPOS="1234" VPOS="1234" WIDTH="1234" HEIGHT="1234" />
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>
AND我的代码是:
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
root.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
下面是我得到的错误:
ValueError: list.remove(x): x not in list
非常感谢您的帮助💐
ElementFather.remove(ElementChild)
仅当ElementChild
是ElementFather
的子元素时才起作用。在你的例子中,你必须调用remove from PrintSpace。
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
PrintSpace = root.find('.//alto:PrintSpace',ns)
PrintSpace.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
注意:这段代码只是一个工作解决方案的例子,当然你可以改进它。