Python 代码昨天工作正常,现在得到了这个"IndexError: list index out of range"



我正在尝试从网站中抓取数据

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.mohfw.gov.in/"
r = requests.get(url)
html =r.text
soup = BeautifulSoup(html,'html.parser')
#print(soup)
id = soup.find('div',id='cases')
table_body = id.find('tbody')
table_rows = table_body.find_all('tr')
sl_no = []
States = []
Cases = []
Recovered = []
Deaths = []

试图循环并将表行添加到以上空白列,但出现错误

for tr in table_rows:
td = tr.find_all('td')
sl_no.append(td[0].text)
States.append(td[1].text)
Cases.append(td[2].text)
Recovered.append(td[3].text)
Deaths.append(td[-1].text)

headers = ['sl_no','States','Cases','Recovered','Deaths']
df = pd.DataFrame(list(zip(sl_no,States,Cases,Recovered,Deaths)),columns=headers)
df1 = df.drop(index=27)

这是我的错误

States.append(td[1].text)
IndexError: list index out of range

您可以测试td列表的长度,问题是最后一个是长度1,因此通过td[1]:选择列表的第二个值时出错

for tr in table_rows:
td = tr.find_all('td')
print (len(td))
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
1

因此,您的解决方案应该更改为过滤所有长度为5:的td值

for tr in table_rows:
td = tr.find_all('td')
if len(td) == 5:
sl_no.append(td[0].text)
States.append(td[1].text)
Cases.append(td[2].text)
Recovered.append(td[3].text)
Deaths.append(td[-1].text)
headers = ['sl_no','States','Cases','Recovered','Deaths']
df = pd.DataFrame(list(zip(sl_no,States,Cases,Recovered,Deaths)),columns=headers)
print (df)
sl_no                       States Cases Recovered Deaths
0      1               Andhra Pradesh    23         1      0
1      2  Andaman and Nicobar Islands     9         0      0
2      3                        Bihar    15         0      1
3      4                   Chandigarh     8         0      0
4      5                 Chhattisgarh     7         0      0
5      6                        Delhi    87         6      2
6      7                          Goa     5         0      0
7      8                      Gujarat    69         1      6
8      9                      Haryana    36        18      0
9     10             Himachal Pradesh     3         0      1
10    11            Jammu and Kashmir    48         2      2
11    12                    Karnataka    83         5      3
12    13                       Kerala   202        19      1
13    14                       Ladakh    13         3      0
14    15               Madhya Pradesh    47         0      3
15    16                  Maharashtra   198        25      8
16    17                      Manipur     1         0      0
17    18                      Mizoram     1         0      0
18    19                       Odisha     3         0      0
19    20                   Puducherry     1         0      0
20    21                       Punjab    38         1      1
21    22                    Rajasthan    59         3      0
22    23                   Tamil Nadu    67         4      1
23    24                    Telengana    71         1      1
24    25                  Uttarakhand     7         2      0
25    26                Uttar Pradesh    82        11      0
26    27                  West Bengal    22         0      2

我认为您可以使用read_html:简化代码

url = "https://www.mohfw.gov.in/"
df = pd.read_html(url)[-1]

然后删除最后2行:

df = df.iloc[:-2]

print (df)
S. No.           Name of State / UT Total Confirmed cases *  
0       1               Andhra Pradesh                      23   
1       2  Andaman and Nicobar Islands                       9   
2       3                        Bihar                      15   
3       4                   Chandigarh                       8   
4       5                 Chhattisgarh                       7   
5       6                        Delhi                      87   
6       7                          Goa                       5   
7       8                      Gujarat                      69   
8       9                      Haryana                      36   
9      10             Himachal Pradesh                       3   
10     11            Jammu and Kashmir                      48   
11     12                    Karnataka                      83   
12     13                       Kerala                     202   
13     14                       Ladakh                      13   
14     15               Madhya Pradesh                      47   
15     16                  Maharashtra                     198   
16     17                      Manipur                       1   
17     18                      Mizoram                       1   
18     19                       Odisha                       3   
19     20                   Puducherry                       1   
20     21                       Punjab                      38   
21     22                    Rajasthan                      59   
22     23                   Tamil Nadu                      67   
23     24                    Telengana                      71   
24     25                  Uttarakhand                       7   
25     26                Uttar Pradesh                      82   
26     27                  West Bengal                      22   
Cured/Discharged/Migrated Death  
0                          1     0  
1                          0     0  
2                          0     1  
3                          0     0  
4                          0     0  
5                          6     2  
6                          0     0  
7                          1     6  
8                         18     0  
9                          0     1  
10                         2     2  
11                         5     3  
12                        19     1  
13                         3     0  
14                         0     3  
15                        25     8  
16                         0     0  
17                         0     0  
18                         0     0  
19                         0     0  
20                         1     1  
21                         3     0  
22                         4     1  
23                         1     1  
24                         2     0  
25                        11     0  
26                         0     2  

其中一个<tr>似乎没有包含您认为应该包含的所有<td>

从数据本身的快速查看来看,该数据的最后一个<tr>似乎包含了所有状态的某种摘要。在这种情况下,您可能应该切断for循环中的最后一个<td>

for tr in table_rows[:-1]

或者用包装

for tr in table_rows:
try:
td = tr.find_all('td')
sl_no.append(td[0].text)
States.append(td[1].text)
Cases.append(td[2].text)
Recovered.append(td[3].text)
Deaths.append(td[-1].text)
except Exception as e:
# Pass or handle the exception as you wish.
pass 

最新更新