在Python中使用元素树解析错误XML



Python新手,在从URL源转换为XML时遇到问题。尝试了相当多的方法来修复代码,但我卡住了。任何建议都会很有帮助!

程序错误在下面在'xtree = ER.parse(fhand)'

错误:

Traceback (most recent call last): File "C:UsersSimeonDesktopPy4eex13_1.py", line 12, in <module> xtree = ET.parse(fhand) File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0libxmletreeElementTree.py", line 1222, in parse tree.parse(source, parser) File "C:Program FilesWindowsAppsPythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0libxmletreeElementTree.py", line 580, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: no element found: line 1, column 0

代码:

import xml.etree.ElementTree as ET
import urllib.request
from urllib.request import urlopen
fhand = urllib.request.urlopen('https://py4e-data.dr-chuck.net/comments_42.xml')
sum = 0
for line in fhand: 
sum = sum + len(line)
print(sum)
xtree = ET.parse(fhand)
xroot = xtree.getroot()
xlist = xtree.findall()
print(len(xlist))
lst = xtree.findall('comments/comment')
print('count:' , len(lst))
flist = [] 
for item in lst: 
num = item.find('count').text
flist.append(num) 
for i in range(0, len(flist)): 
flist[i] = int(flist[i]) 
print(sum(flist))`

尝试转换为字符串,但ER。解析需要一个bytes类型的对象。我也有很多httpresponse错误,我不确定为什么

fhand是文件处理程序。在计算响应的长度时,将光标移动到文件的末尾。所以你没有更多的解析…您必须在使用fhand.seek(0)的文件开头寻找指针,但您可以使用HTTPResponse的头文件做得更好。

重写代码:

import xml.etree.ElementTree as ET
import urllib.request
from urllib.request import urlopen
fhand = urllib.request.urlopen('https://py4e-data.dr-chuck.net/comments_42.xml')
print(f"Content-Length: {fhand.getheader('Content-Length')}")
xtree = ET.parse(fhand)
xroot = xtree.getroot()
lst = xtree.findall('.//comment')
print(f"Count: {len(lst)}")
flist = [int(item.find('count').text) for item in lst]
print(f"Sum: {sum(flist)}")
输出:

Content-Length: 4189
Count: 50
Sum: 2553

如果你想做数据分析,请使用Pandas:

# pip install pandas
import pandas as pd
df = pd.read_xml('https://py4e-data.dr-chuck.net/comments_42.xml', xpath='.//comment')
print(f"Count: {len(df)}")
print(f"Sum: {df['count'].sum()}")

细节:

>>> df
name  count
0        Romina     97
1        Laurie     97
2         Bayli     90
3        Siyona     90
4        Taisha     88
5        Alanda     87
6       Ameelia     87
7     Prasheeta     80
8          Asif     79
9          Risa     79
10           Zi     78
11       Danyil     76
12       Ediomi     76
13        Barry     72
14        Lance     72
15       Hattie     66
16        Mathu     66
17        Bowie     65
18       Samara     65
19      Uchenna     64
20       Shauni     61
21      Georgia     61
22        Rivan     59
23        Kenan     58
24       Hassan     57
25         Isma     57
26  Samanthalee     54
27        Alexa     51
28        Caine     49
29        Grady     47
30         Anne     40
31        Rihan     38
32       Alexei     37
33        Indie     36
34    Rhuairidh     36
35    Annoushka     32
36        Kenzi     25
37        Shahd     24
38       Irvine     22
39        Carys     21
40         Skye     19
41        Atiya     18
42        Rohan     18
43        Nuala     14
44        Maram     12
45        Carlo     12
46      Japleen      9
47     Breeanna      7
48       Zaaine      3
49        Inika      2

最新更新