从URL检索时解析xml时出现问题



我正在做Coursera Python课程的作业。目标是将每个用户名的计数相加,得到最终计数。

XML:http://py4e-data.dr-chuck.net/comments_42.xml

如果我复制并粘贴XML,并使用下面的程序对其进行解析,它就可以正常工作。

import xml.etree.ElementTree as ET
input = (XML string goes here)
ct = 0
stuff = ET.fromstring(input)
lst = stuff.findall('comments/comment')
for item in lst:
print('Name', item.find('name').text)
print('Count', item.find('count').text)
ct = ct + int(item.find('count').text)
print(ct)

问题是当我试图直接从URL获取它时。在这种情况下,我尝试了两种方法:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = commentinfo.findall('comments/comment')   
for item in lst:
print('Count', item.find('count').text)

这导致以下错误:

Traceback (most recent call last):
File "C:UserspatriDesktopPY4EMaterialscode3urllib1.py", line 10, in <module>
lst = commentinfo.findall('comments/comment')
NameError: name 'commentinfo' is not defined

第二种方法是任务建议的方法,使用以下访问计数的方式:

counts = tree.findall('.//count')

所以我写了以下代码:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
print('Count', item.find('count').text)

这显然导致了None类型,我对此无能为力:

Traceback (most recent call last):
File "C:UserspatriDesktopPY4EMaterialscode3urllib1.py", line 12, in <module>
print('Count', item.find('count').text)
AttributeError: 'NoneType' object has no attribute 'text'

在第一个代码片段中,由于变量commentinfo,错误为NameError: name 'commentinfo' is not defined,该变量未声明:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
# commentinfo not declared
lst = commentinfo.findall('comments/comment')   
for item in lst:
print('Count', item.find('count').text)

将其替换为变量tree以使代码工作:

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
lst = tree.findall('comments/comment')   
for item in lst:
print('Count', item.find('count').text)

在第二个代码片段中,表达式tree.findall('.//count')已经获得了count元素的列表。因此,当在循环中调用item.find('count')时,它在count元素中找不到名为count的子元素,从而导致错误AttributeError: 'NoneType' object has no attribute 'text'。要修复它,请从循环中删除item.find('count')

import urllib.request,urllib.parse, urllib.error
import xml.etree.ElementTree as ET
uh = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

data = uh.read()
print(data.decode())
tree = ET.fromstring(data)
counts = tree.findall('.//count')
for item in counts:
print('Count', item.text)

最新更新