忽略符号'\n'后的pretty_print。例如:
import lxml.etree as etree
strs = ["<root>n<e1/><e2/></root>",
"<root><e1/><e2/></root>"]
for str in strs:
xml = etree.fromstring(str)
print etree.tostring(xml, pretty_print=True)
输出为:
<root>
<e1/><e2/></root>
<root>
<e1/>
<e2/>
</root>
这两个字符串都是有效的xml。第一个字符串具有符号"\n",在此符号之后将忽略pretty_print。
是它和lxml错误,还是我需要特殊的操作来进行漂亮的格式化?
谢谢你,Corley
此行为的原因如下:http://lxml.de/FAQ.html#why-做n-the-prety-print-options-reformat-my-xml-output
正确的代码是:
import lxml.etree as etree
strs = ["<root>n<e1/><e2/></root>",
"<root><e1/><e2/></root>"]
parser = etree.XMLParser(remove_blank_text=True)
for str in strs:
xml = etree.fromstring(str, parser=parser)
print etree.tostring(xml, pretty_print=True)
# or for Python 3.x
print(etree.tostring(xml, pretty_print=True).decode())
# here I assume utf-8 encoding