我有这个xml文件我需要以与XML文件相同的顺序读取同步和事件的值。
<Episode>
<Section type="report" startTime="0" endTime="263.035">
<Turn startTime="0" endTime="4.844" speaker="spk1">
<Sync time="0"/>
aaaaa
</Turn>
<Turn speaker="spk2" startTime="4.844" endTime="15.531">
<Sync time="4.844"/>
bbbbb
<Event desc="poz" type="noise" extent="begin"/>
ccccc
<Event desc="poz" type="noise" extent="end"/>
ddddd
<Sync time="12.210"/>
eeeee
</Turn>
<Turn speaker="spk1" startTime="15.531" endTime="17.549">
<Event desc="poz" type="noise" extent="begin"/>
fffff
</Turn>
</Section>
</Episode>
我需要此输出:
aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff
有解决方案吗?谢谢。
使用内置萨克斯解析器:
from xml import sax
class EpisodeContentHandler(sax.ContentHandler):
def characters(self, content):
content = content.strip()
if content:
print content
with open("Episode.xml") as f:
sax.parse(f, EpisodeContentHandler())
,除非您以某种方式限于使用Minidom,否则请按照Martijn建议使用'ElementTree'。根据我的个人经验,它更容易使用。您可以在此处找到它的文档
出于问题,您可以尝试这样的事情:
import xml.etree.ElementTree as ET
# Get the tree structure of the XML
tree = ET.parse("data.xml")
# Get the root/first tag in the tree
root = tree.getroot()
# Ge all elements with interesting tags
for child in root.findall("Sync"):
print child.text
Sidenote:child.attrib
是所有标签属性的地图。
如果您坚持使用Minidom:
elements = minidom.parseString(xml).getElementsByTagName('*') # where xml is your input xml
for el in elements:
if el.localName == 'Sync' or el.localName == 'Event':
print el.nextSibling.nodeValue.strip()
这将打印:
aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff