附加一个XML节点,该节点从文件中断的pretty_print中读取了相邻节点



我正在使用Python的Etree库生成XML文件。从现有的XML文件中读取生成文件中的一个节点。添加此元素会在直接和之后的节点中打破pretty_print。

import xml.etree.cElementTree as ET
from lxml import etree
root = etree.Element("startNode")
subnode1 = etree.SubElement(root, "SubNode1")
subnode1Child1 = etree.SubElement(subnode1, "subNode1Child1")
etree.SubElement(subnode1Child1, "Child1")
etree.SubElement(subnode1Child1, "Child2")
f = open('/xml_testdata/ext_file.xml','r')
ext_xml = etree.fromstring(f.read())
ext_subnode = ext_xml.find("ExtNode")
subnode1.append(ext_subnode)
subnode1Child2 = etree.SubElement(subnode1, "subNode1Child2")
etree.SubElement(subnode1Child2, "Child1")
etree.SubElement(subnode1Child2, "Child2")
tree = etree.ElementTree(root)
tree.write("testfile.xml", xml_declaration=True, pretty_print=True)

给出此结果:

<startNode>
    <SubNode1><subNode1Child1><Child1/><Child2/></subNode1Child1><ExtNode>
            <NodeFromExt>
                <SubNodeFromExt1/>
            </NodeFromExt>
            <NodeFromExt>
                <SubNodeFromExt2/>
                <AnotherSubNodeFromExt2>
                    <SubSubNode/>
                    <AllPrettyHere>
                        <Child/>
                    </AllPrettyHere>
                </AnotherSubNodeFromExt2>
            </NodeFromExt>
    </ExtNode>
    <subNode1Child2><Child1/><Child2/></subNode1Child2></SubNode1>
</startNode>

不是很可读,是吗?当" subnodechild"包含比这个示例更多的子节点时,更糟糕的是!

没有附加外部元素,看起来像这样:

<startNode>
  <SubNode1>
    <subNode1Child1>
      <Child1/>
      <Child2/>
    </subNode1Child1>
    <subNode1Child2>
      <Child1/>
      <Child2/>
    </subNode1Child2>
  </SubNode1>
</startNode>

因此,问题是由附加外部元素引起的!

有没有办法在不破坏pretty_print-utput的情况下附加外部元素?

您可以使用解析现有XML文件时删除可忽略的whitespace的解析器对象获得更好印刷的输出。

而不是这样:

f = open('/xml_testdata/ext_file.xml','r')
ext_xml = etree.fromstring(f.read())

使用此:

f = open('/xml_testdata/ext_file.xml', 'r')
parser = etree.XMLParser(remove_blank_text=True)
ext_xml = etree.fromstring(f.read(), parser)

另请参见:

  • http://lxml.de/api/lxml.etree.xmlparser-class.html
  • http://lxml.de/faq.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-unput

我能够通过用etree.sublement创建" extNode"来减轻效果。

ext_node = etree.SubElement(subnode1, "ExtNode")
for element in ext_xml.findall("ExtNode/NodeFromExt")
  ext_node.append(element)

具有以下结果:

<startNode>
  <SubNode1>
    <subNode1Child1>
      <Child1/>
      <Child2/>
    </subNode1Child1>
    <ExtNode><NodeFromExt>
      <SubNodeFromExt1/>
        </NodeFromExt>
    <NodeFromExt>
      <SubNodeFromExt2/>
        <AnotherSubNodeFromExt2>
          <SubSubNode/>
          <AllPrettyHere>
            <Child/>
          </AllPrettyHere>
        </AnotherSubNodeFromExt2>
    </NodeFromExt>
  </ExtNode>
    <subNode1Child2>
      <Child1/>
      <Child2/>
    </subNode1Child2>
  </SubNode1>
</startNode>

不是完美的,但至少人类可读(这是Pretty_print的全部要点,对吧?)

为了满足我的强迫症,如果有一种完美的格式化文件,我仍然很感兴趣!

最新更新