我有一个xml文件,其中有几个区域边界定义。问题是每个区域的边界点不是由许多标签的
定义的。<point> lat lon </point>
<point> lat lon </point>
<point> lat lon </point>
<point> lat lon >/point>
但在一个长标签
<poslist> lat1 lon1 lat2 lon2 lat3 lon3 lat4 lon4 ..... (and so on) </poslist>
如何解析这样的文件?
的问候Skorek
在python(2.7)有一个文件latlon.py
:
import xml.etree.ElementTree as ET
def text2tuple(coordstr, convert=float):
""" takes space delimited string and converts to tuple, converting each item
>>> text2tuple("alfa beta", str)
("alfa", "beta")
>>> text2tuple("1.1 2.2 3.3", float)
(1.1, 2.2, 3.3)
"""
return map(convert, coordstr.split())
def zip2couples(lst):
""" for list of items return half lenght list of couples
>>> zip2couples(1, 2, 3, 4, 5, 6)
[(1, 2), (3, 4), (5, 6)]
"""
return zip(lst[::2], lst[1::2])
def processxmlstr(xmlstr, convert=float):
xmldoc = ET.fromstring(xmlstr)
print "points", [text2tuple(itm.text, convert) for itm in xmldoc.findall(".//point")]
print "poslists", [zip2couples(text2tuple(itm.text, convert)) for itm in xmldoc.findall(".//poslist")]
print "string values test -----"
xmlstr = """
<root>
<point>lat lon</point>
<point>lat lon</point>
<point>lat lon</point>
<point>lat lon</point>
<poslist>lat1 lon1 lat2 lon2 lat3 lon3 lat4 lon4</poslist>
</root>
"""
processxmlstr(xmlstr, str)
print "float values test -----"
xmlstr = """
<root>
<point>1.1 11.11</point>
<point>2.2 22.22</point>
<point>3.3 33.33</point>
<point>4.4 44.44</point>
<poslist>10.1 99.1 10.2 99.2 10.3 99.3</poslist>
<poslist>20.1 88.1 20.2 88.2 20.3 88.3</poslist>
</root>
"""
processxmlstr(xmlstr, float)
从控制台运行:
$ python latlon.py
string values test -----
points [['lat', 'lon'], ['lat', 'lon'], ['lat', 'lon'], ['lat', 'lon']]
poslists [[('lat1', 'lon1'), ('lat2', 'lon2'), ('lat3', 'lon3'), ('lat4', 'lon4')]]
float values test -----
points [[1.1, 11.11], [2.2, 22.22], [3.3, 33.33], [4.4, 44.44]]
poslists [[(10.1, 99.1), (10.2, 99.2), (10.3, 99.3)], [(20.1, 88.1), (20.2, 88.2), (20.3, 88.3)]]