etree解析内部带有转义html的xml

我有一个xml文件，里面有转义的html，字段如下：

<title>Some records title with html This should be inside escaped html , end of the title</title>

我发现这个元素很好：

el = titles.find("x:title", NS)

但当我这样做时：

el.text

它返回带有非转义标签的文本：

'Some records title with html This should be inside escaped html ;, end of the title'

为什么会这样？我是否必须再次单独转义html标记，即使它是转义的？我希望能够为xml提供转义和非转义的html标记(有时将其显示为文本，有时显示为格式化文本(。如何正确提供？

使用ElementTree函数时，可以使用_escape_attrib()：

import xml.etree.ElementTree as ET
text = '''<title>Some records title with html &lt;i&gt; This should be inside escaped html &lt;/i&gt;, end of the title</title>
'''
root = ET.fromstring(text)
print(ET._escape_attrib(root.text))

这将输出Some records title with html This should be inside escaped html , end of the title。

相关内容

最新更新

热门标签：