Python新手,在从URL源转换为XML时遇到问题。尝试了相当多的方法来修复代码,但我卡住了。任何建议都会很有帮助!
程序错误在下面在'xtree = ER.parse(fhand)'
错误:
Traceback (most recent call last): File "C:UsersSimeonDesktopPy4eex13_1.py", line 12, in <module> xtree = ET.parse(fhand) File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0libxmletreeElementTree.py", line 1222, in parse tree.parse(source, parser) File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0libxmletreeElementTree.py", line 580, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: no element found: line 1, column 0
代码:
import xml.etree.ElementTree as ET
import urllib.request
from urllib.request import urlopen
fhand = urllib.request.urlopen('https://py4e-data.dr-chuck.net/comments_42.xml')
sum = 0
for line in fhand:
sum = sum + len(line)
print(sum)
xtree = ET.parse(fhand)
xroot = xtree.getroot()
xlist = xtree.findall()
print(len(xlist))
lst = xtree.findall('comments/comment')
print('count:' , len(lst))
flist = []
for item in lst:
num = item.find('count').text
flist.append(num)
for i in range(0, len(flist)):
flist[i] = int(flist[i])
print(sum(flist))`
尝试转换为字符串,但ER。解析需要一个bytes类型的对象。我也有很多httpresponse错误,我不确定为什么
fhand
是文件处理程序。在计算响应的长度时,将光标移动到文件的末尾。所以你没有更多的解析…您必须在使用fhand.seek(0)
的文件开头寻找指针,但您可以使用HTTPResponse
的头文件做得更好。
重写代码:
import xml.etree.ElementTree as ET
import urllib.request
from urllib.request import urlopen
fhand = urllib.request.urlopen('https://py4e-data.dr-chuck.net/comments_42.xml')
print(f"Content-Length: {fhand.getheader('Content-Length')}")
xtree = ET.parse(fhand)
xroot = xtree.getroot()
lst = xtree.findall('.//comment')
print(f"Count: {len(lst)}")
flist = [int(item.find('count').text) for item in lst]
print(f"Sum: {sum(flist)}")
输出:
Content-Length: 4189
Count: 50
Sum: 2553
如果你想做数据分析,请使用Pandas:
# pip install pandas
import pandas as pd
df = pd.read_xml('https://py4e-data.dr-chuck.net/comments_42.xml', xpath='.//comment')
print(f"Count: {len(df)}")
print(f"Sum: {df['count'].sum()}")
细节:
>>> df
name count
0 Romina 97
1 Laurie 97
2 Bayli 90
3 Siyona 90
4 Taisha 88
5 Alanda 87
6 Ameelia 87
7 Prasheeta 80
8 Asif 79
9 Risa 79
10 Zi 78
11 Danyil 76
12 Ediomi 76
13 Barry 72
14 Lance 72
15 Hattie 66
16 Mathu 66
17 Bowie 65
18 Samara 65
19 Uchenna 64
20 Shauni 61
21 Georgia 61
22 Rivan 59
23 Kenan 58
24 Hassan 57
25 Isma 57
26 Samanthalee 54
27 Alexa 51
28 Caine 49
29 Grady 47
30 Anne 40
31 Rihan 38
32 Alexei 37
33 Indie 36
34 Rhuairidh 36
35 Annoushka 32
36 Kenzi 25
37 Shahd 24
38 Irvine 22
39 Carys 21
40 Skye 19
41 Atiya 18
42 Rohan 18
43 Nuala 14
44 Maram 12
45 Carlo 12
46 Japleen 9
47 Breeanna 7
48 Zaaine 3
49 Inika 2