不要使用 Python ElementTree 对元素文本对象进行编码



我试图在元素的文本节点中使用HTML数据,但它就好像它不是HTML数据一样被编码。

这是MWE:

from xml.etree import ElementTree as ET
data = '<a href="https://example.com">Example data gained from elsewhere.</a>'
p = ET.Element('p')
p.text = data
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

输出是…

<p>&lt;a href="https://example.com"&gt;Example data gained from elsewhere.&lt;/a&gt;</p>

我的意图是…

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>

您所做的是错误的。您正在分配p.text = data,它基本上将节点视为文本内容。很明显,文本是转义的。你必须把它作为一个孩子来添加。如下所示:

from xml.etree import ElementTree as ET
data = '<a href="https://example.com">Example data gained from elsewhere.</a>'
d = ET.fromstring(data)
p = ET.Element('p')
p.append(d)
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

给出输出

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>

您可以将HTML字符串解析为ElementTree对象并将其附加到DOM:

from xml.etree import ElementTree as ET
data = '<a href="https://example.com">Example data gained from elsewhere.</a>'
p = ET.Element('p')
p.append(ET.fromstring(data))
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

相关内容

最新更新