我使用minidom模块从我的数据创建XML文档。
目前我正在努力寻找一些python的方法来防止minidom逃离我的字符串我放在那里..
所有邪恶的原因是_write_data
方法(在模块的第302行):
def _write_data(writer, data):
"Writes datachars to writer."
if data:
data = data.replace("&", "&").replace("<", "<").
replace(""", """).replace(">", ">")
writer.write(data)
我想要的是没有replace
的data
。
我找到了一些方法来防止这种情况,通过monkeypathing两个函数:
- 父节点
writexml
- 并在该补丁中:
-
_write_data
-
我准备了一些例子:
from xml.dom import minidom
SNOWMAN = '☃︎'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
# this does unwanted double escaping:
evil.appendChild(dom.createTextNode(SNOWMAN))
# now for something completely different ...
# this is some way to fix this:
good = dom.createElement('good')
root.appendChild(good)
# - store original ``writexml`` and ``_write_data``
original_writexml = good.writexml
original_write_data = minidom._write_data
def fake_writexml(writer, indent, addindent, newl):
def fake_writedata(writer, data):
if data:
writer.write(data)
# - overwrite ``_write_data``
minidom._write_data = fake_writedata
# - call original ``writexml``
# -> which itself calls the now patched ``_write_data``
original_writexml(writer, indent, addindent, newl)
# - reset ``_write_data`` again
minidom._write_data = original_write_data
# - overwrite ``writexml``
good.writexml = fake_writexml
# - do stuff
good.appendChild(dom.createTextNode(SNOWMAN))
# -> yay, it works!
print(dom.toprettyxml(indent=' '))
# - reset ``writexml`` again
good.writexml = original_writexml
# -> returns trash again..
print(dom.toprettyxml(indent=' '))
它将产生如下输出:
<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>☃︎</good>
</root>
<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>&#x2603;&#xfe0e;</good>
</root>
我个人不认为这是好的代码,因为它与minidom
的内部混乱,你必须小心不要犯任何错误。
请告诉我你能想到的最python的解决方案-这样我就可以享受雪人了;-)
& # x2603; & # xfe0e;
进一步思考我的问题,我有一个想法:
难道不能定义一个新的Node类型吗?
确实如此!
from xml.dom import minidom
SNOWMAN = '☃︎'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
我在这里定义了自己的Node:
class RawText(minidom.Text):
def writexml(self, writer, indent='', addindent='', newl=''):
'''
patching minidom.Text.writexml:1087
the original calls minidom._write_data:302
below is a combined version of both, but without the '&' replacements and so on..
'''
if self.data:
writer.write('{}{}{}'.format(indent, self.data, newl))
之后,我为原来的minidom.Document
编写了一些辅助函数来创建我自己类型的新节点。
def createRawTextNode(data):
'''
helper function for minidom.Document:1519 to create Nodes of RawText
see minidom.Document.createTextNode:1656
'''
if not isinstance(data, str):
raise TypeError('node contents must be a string')
r = RawText()
r.data = data
r.ownerDocument = dom # there is no self
return r
# ... and attach the helper function
dom.createRawTextNode = createRawTextNode
然后继续,当什么都没发生:
root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
evil.appendChild(dom.createTextNode(SNOWMAN))
good = dom.createElement('good')
root.appendChild(good)
# use helper function to create Nodes of RawText
good.appendChild(dom.createRawTextNode(SNOWMAN))
# yay, works! |o_0|
print(dom.toprettyxml(indent=' '))
它终于做了我想要的!
在我的输出中有转义和未转义的字符串。
<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>☃︎</good>
</root>