使用python解析XML文档.不能使用任何需要pip的库



我正在解析一个XML文档,我需要书名&分数下的数值,并将它们放在2d列表中。我当前的代码可以检索该数据并将其放在列表中,但问题是XML文件中有一些部分不存在分数,我需要能够在列表中留下一个指示符(例如N/a),以指示该特定图书标题的值为空。

这是xml文件的一个示例简化版本。请注意,这个问题在更长版本的xml文件中重复出现。所以没有代码可以使用,1作为索引来解决这个问题。

<bookstore>
<book>[A-23] Everyday Italian</book>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
<field></field>
<key id="6408">[A-23]Everyday Italian</key>
<brief>Everyday Italian</brief>
<success></success>
<province> id="256" key=".com.place.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="490" key=".com.ave.fieldtypes:float">
<name>Score</name>
<numbers>
<number>4.0</number>
</numbers>
</province>
<province> id="531" key=".com.spot.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>
<bookstore>
<book>[A-42] Pottery</book>
<author>Leo Di Plos</author>
<year>2012</year>
<price>25.00</price>
<field></field>
<key id="4502">[A-42] Pottery</key>
<brief>Pottery</brief>
<success></success>
<province> id="627" key=".com.tri.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="124" key=".com.doct.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>
<bookstore>
<book>[A-12] Skipping the Line</book>
<author>Gloria Gasol</author>
<year>1999</year>
<price>22.00</price>
<field></field>
<key id="1468">[A-23]Skipping the Line</key>
<brief>Skipping the Line</brief>
<success></success>
<province> id="754" key=".com.cit.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="211" key=".com.soct.fieldtypes:float">
<name>Score</name>
<numbers>
<number>12.0</number>
</numbers>
</province>
<province> id="458" key=".com.lot.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>

这是我当前的代码:

book = []
for book in root.iter('book'):
item1 = book.text
title.append(item1)
score = []
for province in root.iter('province'):
for child in province:
for grandchild in child:
if re.match('^[+-]?d*?.d+$', grandchild.text) != None:
item2 = float(grandchild.text)
score.append(item2)
print(book, score)

期望输出为:

([A-23] Everyday Italian, 4.0), ([A-42] Pottery, N/A), ([A-12] Skipping the Line, 12.0)

但实际输出是:

([A-23] Everyday Italian, 4.0), ([A-42] Pottery, 12.0), ([A-12] Skipping the Line)

python的优势在于创建解决方案的速度,其中包括使用现成的库。为什么不使用像xmltodict这样的lib ?

单个书店:

<bookstore>
<book>[A-23] Everyday Italian</book>**
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
<field></field>
<key id="6408">[A-23]Everyday Italian</key>
<brief>Everyday Italian</brief>
<success></success>
<province> id="256" key=".com.place.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="490" key=".com.ave.fieldtypes:float">
**
<name>Score</name>**

<numbers>
**
<number>4.0</number>**

</numbers>
</province>
<province> id="531" key=".com.spot.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>

python代码:

import xmltodict
dict_data = xmltodict.parse(xml_data)
dict_data
title = dict_data["bookstore"]["book"]
score = dict_data["bookstore"]["province"][1]["numbers"]["number"]

你确定你的xml是正确的吗?您应该创建类似于书店对象列表的内容,例如:

<BookstoreList>
<Bookstore>
//data here
</Bookstore>
<Bookstore>
//data here
</Bookstore>
// etc.
</BookstoreList>

开始了

import xml.etree.ElementTree as ET
xml = '''<r>
<bookstore>
<book>[A-23] Everyday Italian</book>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
<field></field>
<key id="6408">[A-23]Everyday Italian</key>
<brief>Everyday Italian</brief>
<success></success>
<province> id="256" key=".com.place.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="490" key=".com.ave.fieldtypes:float">
<name>Score</name>
<numbers>
<number>4.0</number>
</numbers>
</province>
<province> id="531" key=".com.spot.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>
<bookstore>
<book>[A-42] Pottery</book>
<author>Leo Di Plos</author>
<year>2012</year>
<price>25.00</price>
<field></field>
<key id="4502">[A-42] Pottery</key>
<brief>Pottery</brief>
<success></success>
<province> id="627" key=".com.tri.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="124" key=".com.doct.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>
<bookstore>
<book>[A-12] Skipping the Line</book>
<author>Gloria Gasol</author>
<year>1999</year>
<price>22.00</price>
<field></field>
<key id="1468">[A-23]Skipping the Line</key>
<brief>Skipping the Line</brief>
<success></success>
<province> id="754" key=".com.cit.fieldtypes:float">
<name>Post</name>
<numbers>
<number></number>
</numbers>
</province>
<province> id="211" key=".com.soct.fieldtypes:float">
<name>Score</name>
<numbers>
<number>12.0</number>
</numbers>
</province>
<province> id="458" key=".com.lot.fieldtypes:float">
<name>Doc</name>
<numbers>
<number></number>
</numbers>
</province>
</bookstore>
</r>
'''
root = ET.fromstring(xml)
data = []
for bs in root.findall('.//bookstore'):
book = bs.find('book').text
scores = [s.text for s in bs.findall('.//number') if s.text]
score = 'N/A' if not scores else scores[0]
data.append((book, score))
print(data)

输出
[('[A-23] Everyday Italian', '4.0'), ('[A-42] Pottery', 'N/A'), ('[A-12] Skipping the Line', '12.0')]

相关内容

最新更新