为什么在尝试访问HTML表中的前两列时会出现错误


import requests
from bs4 import BeautifulSoup
wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url, 'lxml')
table_class = "wikitable plainrowheaders sortable"
my_table = soup.find('table', {'class': table_class})

Film = []
release = []
for row in my_table.find_all('i')[0:]:
Film_cell = row.find_all('a')[0]
Film.append(Film_cell.text)
print(Film)
for row in my_table.find_all('td')[0:]:
release = row.find_all('span')[:1]
release.append(release.text)
print(release)

输出:

['Toy Story', "A Bug's Life", 'Toy Story 2', 'Monsters, Inc.',
'Finding Nemo', 'The Incredibles', 'Cars', 'Ratatouille', 'WALL-E',
'Up', 'Toy Story 3', 'Cars 2', 'Brave', 'Monsters University', 'Inside Out',
'The Good Dinosaur', 'Finding Dory', 'Cars 3', 'Coco', 'Incredibles 2',
'Toy Story 4', 'Onward', 'Soul', 'Luca', 'Turning Red', 'Lightyear']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-223-6481bc092354> in <module>
7 for row in my_table.find_all('td')[0:]:
8     release = row.find_all('span')[:1]
----> 9     release.append(release.text)
10 print(release)
AttributeError: 'list' object has no attribute 'text'
for row in my_table.find_all('td')[0:]:
release= row.find_all('span')[:1]
release.append(release.text)
print(release)
  • my_table.find_all('td')[0:]my_table.find_all('td')相同
  • row.find_all('span')[:1]是一个列表,可能是row.find_all('span'([0]
  • release= row.find_all('span')[:1],应使用另一个变量

获取前两列,不包括索引列。

release = []
for row in my_table.find_all('td'):
span = row.find_all('span')
if span:
release.append(span[0].text)
print(release)
[('Toy Story', 'November 22, 1995'), ("A Bug's Life", 'November 25, 1998'), ('Toy Story 2', 'November 24, 1999'), ('Monsters, Inc.', 'November 2, 2001'), ('Finding Nemo', 'May 30, 2003'), ('The Incredibles', 'November 5, 2004'), ('Cars', 'June 9, 2006'), ('Ratatouille', 'June 29, 2007'), ('WALL-E', 'June 27, 2008'), ('Up', 'May 29, 2009'), ('Toy Story 3', 'June 18, 2010'), ('Cars 2', 'June 24, 2011'), ('Brave', 'June 22, 2012'), ('Monsters University', 'June 21, 2013'), ('Inside Out', 'June 19, 2015'), ('The Good Dinosaur', 'November 25, 2015'), ('Finding Dory', 'June 17, 2016'), ('Cars 3', 'June 16, 2017'), ('Coco', 'November 22, 2017'), ('Incredibles 2', 'June 15, 2018'), ('Toy Story 4', 'June 21, 2019'), ('Onward', 'March 6, 2020'), ('Soul', 'December 25, 2020'), ('Luca', 'June 18, 2021'), ('Turning Red[1]', 'March 11, 2022[5]'), ('Lightyear[2]', 'June 17, 2022[5]'), ('TBA', 'June 16, 2023[8]'), ('TBA', 'March 1, 2024[4]'), ('TBA', 'June 14, 2024[4]')]

代码release= row.find_all('span')[:1]生成一个列表,该列表没有";文本";参数您需要进一步解析它以获得";文本";即release.append(release[0].text)而不是release.append(release.text)

但这也将产生";索引越界错误";因为循环中有许多列表是空的。

修改后的代码如下:

import requests
from bs4 import BeautifulSoup
wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url,'lxml')
table_class = "wikitable plainrowheaders sortable"
my_table = soup.find('table',{'class':table_class})

Film = []
release = []
for row in my_table.find_all('i')[0:]:
Film_cell = row.find_all('a')[0]
Film.append(Film_cell.text)
print(Film)
new_list = []
for row in my_table.find_all('td')[0:]:
release= row.find_all('span')[:1]    
if len(release) > 0:
new_list.append(release[0].text)
print(new_list)

最新更新