我有一个这样的 xml 文件
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
我有这样的python代码
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
我只得到
'My name is' as an output.
我想得到
'My name is <b>Wrufesh</b>. What is yours?' as an output.
我能做什么?
您可以使用 ElementTree.tostringlist()
获得所需的输出:
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', 'n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
我想知道这对于通用用途有多实用。
我不认为将 xml 中的标签视为字符串是正确的。您可以像这样访问 xml 的文本部分:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
我建议对 xml 文件进行预处理,以将元素包装在 CDATA 中的<text>
元素下。之后您应该能够毫无问题地读取这些值。
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>