Python xml.dom.minidom - 请不要逃脱我的字符串



我使用minidom模块从我的数据创建XML文档。

目前我正在努力寻找一些python的方法来防止minidom逃离我的字符串我放在那里..

所有邪恶的原因是_write_data方法(在模块的第302行):

def _write_data(writer, data):
    "Writes datachars to writer."
    if data:
        data = data.replace("&", "&amp;").replace("<", "&lt;"). 
                    replace(""", "&quot;").replace(">", "&gt;")
        writer.write(data)

我想要的是没有replacedata


我找到了一些方法来防止这种情况,通过monkeypathing两个函数:

  • 父节点writexml
  • 并在该补丁中:
    • _write_data

我准备了一些例子:

from xml.dom import minidom
SNOWMAN = '&#x2603;&#xfe0e;'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
# this does unwanted double escaping:
evil.appendChild(dom.createTextNode(SNOWMAN))
# now for something completely different ...
# this is some way to fix this:
good = dom.createElement('good')
root.appendChild(good)
# - store original ``writexml`` and ``_write_data``
original_writexml = good.writexml
original_write_data = minidom._write_data

def fake_writexml(writer, indent, addindent, newl):
    def fake_writedata(writer, data):
        if data:
            writer.write(data)
    # - overwrite ``_write_data``
    minidom._write_data = fake_writedata
    # - call original ``writexml``
    # -> which itself calls the now patched ``_write_data``
    original_writexml(writer, indent, addindent, newl)
    # - reset ``_write_data`` again
    minidom._write_data = original_write_data
# - overwrite ``writexml``
good.writexml = fake_writexml
# - do stuff
good.appendChild(dom.createTextNode(SNOWMAN))
# -> yay, it works!
print(dom.toprettyxml(indent=' '))
# - reset ``writexml`` again
good.writexml = original_writexml
# -> returns trash again..
print(dom.toprettyxml(indent=' '))

它将产生如下输出:

<?xml version="1.0" ?>
<root>
 <evil>&amp;#x2603;&amp;#xfe0e;</evil>
 <good>&#x2603;&#xfe0e;</good>
</root>
<?xml version="1.0" ?>
<root>
 <evil>&amp;#x2603;&amp;#xfe0e;</evil>
 <good>&amp;#x2603;&amp;#xfe0e;</good>
</root>

我个人不认为这是好的代码,因为它与minidom的内部混乱,你必须小心不要犯任何错误。

请告诉我你能想到的最python的解决方案-这样我就可以享受雪人了;-)

& # x2603; & # xfe0e;

进一步思考我的问题,我有一个想法:

难道不能定义一个新的Node类型吗?

确实如此!

from xml.dom import minidom
SNOWMAN = '&#x2603;&#xfe0e;'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)

我在这里定义了自己的Node:

class RawText(minidom.Text):
    def writexml(self, writer, indent='', addindent='', newl=''):
        '''
        patching minidom.Text.writexml:1087
        the original calls minidom._write_data:302
        below is a combined version of both, but without the '&' replacements and so on..
        '''
        if self.data:
            writer.write('{}{}{}'.format(indent, self.data, newl))

之后,我为原来的minidom.Document编写了一些辅助函数来创建我自己类型的新节点。

def createRawTextNode(data):
    '''
    helper function for minidom.Document:1519 to create Nodes of RawText
    see minidom.Document.createTextNode:1656
    '''
    if not isinstance(data, str):
        raise TypeError('node contents must be a string')
    r = RawText()
    r.data = data
    r.ownerDocument = dom  # there is no self
    return r
# ... and attach the helper function
dom.createRawTextNode = createRawTextNode

然后继续,当什么都没发生:

root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
evil.appendChild(dom.createTextNode(SNOWMAN))
good = dom.createElement('good')
root.appendChild(good)
# use helper function to create Nodes of RawText
good.appendChild(dom.createRawTextNode(SNOWMAN))
# yay, works! |o_0|
print(dom.toprettyxml(indent=' '))

它终于做了我想要的!

在我的输出中有转义和未转义的字符串。

<?xml version="1.0" ?>
<root>
 <evil>&amp;#x2603;&amp;#xfe0e;</evil>
 <good>&#x2603;&#xfe0e;</good>
</root>

相关内容

  • 没有找到相关文章

最新更新