我正在使用Python的Etree库生成XML文件。从现有的XML文件中读取生成文件中的一个节点。添加此元素会在直接和之后的节点中打破pretty_print。
。import xml.etree.cElementTree as ET
from lxml import etree
root = etree.Element("startNode")
subnode1 = etree.SubElement(root, "SubNode1")
subnode1Child1 = etree.SubElement(subnode1, "subNode1Child1")
etree.SubElement(subnode1Child1, "Child1")
etree.SubElement(subnode1Child1, "Child2")
f = open('/xml_testdata/ext_file.xml','r')
ext_xml = etree.fromstring(f.read())
ext_subnode = ext_xml.find("ExtNode")
subnode1.append(ext_subnode)
subnode1Child2 = etree.SubElement(subnode1, "subNode1Child2")
etree.SubElement(subnode1Child2, "Child1")
etree.SubElement(subnode1Child2, "Child2")
tree = etree.ElementTree(root)
tree.write("testfile.xml", xml_declaration=True, pretty_print=True)
给出此结果:
<startNode>
<SubNode1><subNode1Child1><Child1/><Child2/></subNode1Child1><ExtNode>
<NodeFromExt>
<SubNodeFromExt1/>
</NodeFromExt>
<NodeFromExt>
<SubNodeFromExt2/>
<AnotherSubNodeFromExt2>
<SubSubNode/>
<AllPrettyHere>
<Child/>
</AllPrettyHere>
</AnotherSubNodeFromExt2>
</NodeFromExt>
</ExtNode>
<subNode1Child2><Child1/><Child2/></subNode1Child2></SubNode1>
</startNode>
不是很可读,是吗?当" subnodechild"包含比这个示例更多的子节点时,更糟糕的是!
没有附加外部元素,看起来像这样:
<startNode>
<SubNode1>
<subNode1Child1>
<Child1/>
<Child2/>
</subNode1Child1>
<subNode1Child2>
<Child1/>
<Child2/>
</subNode1Child2>
</SubNode1>
</startNode>
因此,问题是由附加外部元素引起的!
有没有办法在不破坏pretty_print-utput的情况下附加外部元素?
您可以使用解析现有XML文件时删除可忽略的whitespace的解析器对象获得更好印刷的输出。
而不是这样:
f = open('/xml_testdata/ext_file.xml','r')
ext_xml = etree.fromstring(f.read())
使用此:
f = open('/xml_testdata/ext_file.xml', 'r')
parser = etree.XMLParser(remove_blank_text=True)
ext_xml = etree.fromstring(f.read(), parser)
另请参见:
- http://lxml.de/api/lxml.etree.xmlparser-class.html
- http://lxml.de/faq.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-unput
我能够通过用etree.sublement创建" extNode"来减轻效果。
ext_node = etree.SubElement(subnode1, "ExtNode")
for element in ext_xml.findall("ExtNode/NodeFromExt")
ext_node.append(element)
具有以下结果:
<startNode>
<SubNode1>
<subNode1Child1>
<Child1/>
<Child2/>
</subNode1Child1>
<ExtNode><NodeFromExt>
<SubNodeFromExt1/>
</NodeFromExt>
<NodeFromExt>
<SubNodeFromExt2/>
<AnotherSubNodeFromExt2>
<SubSubNode/>
<AllPrettyHere>
<Child/>
</AllPrettyHere>
</AnotherSubNodeFromExt2>
</NodeFromExt>
</ExtNode>
<subNode1Child2>
<Child1/>
<Child2/>
</subNode1Child2>
</SubNode1>
</startNode>
不是完美的,但至少人类可读(这是Pretty_print的全部要点,对吧?)
为了满足我的强迫症,如果有一种完美的格式化文件,我仍然很感兴趣!