我对python中的web刮削非常陌生。我想从IMDB数据库中提取电影名称、发行年份和评分。这是IMBD的网站,有250部电影和评级https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I使用模块,BeautifulSoup,并请求。这是我的代码
movies = bs.find('tbody',class_='lister-list').find_all('tr')
当我试图提取电影名称时,评级&年,我得到了相同的属性错误。
<td class="title column">
<a href="/title/tt11564570/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=ea4e08e1-c8a3-47b5-ac3a-75026647c16e&pf_rd_r=BQWZRBFAM81S7K6ZBPJP&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1" title="Rian Johnson (dir.), Daniel Craig, Edward Norton">Glass Onion: une histoire à couteaux tirés</a>
<span class="secondary info">(2022)</span>
<div class="velocity">1
<span class="secondary info">(
<span class="global-sprite telemeter up"></span>
1)</span>
<td class="ratingColumn imdbRating">
<strong title="7,3 based on 207 962 user ratings">7,3</strong>strong text
title = movies.find('td',class_='titleColumn').a.text
rating = movies.find('td',class_='ratingColumn imdbRating').strong.text
year = movies.find('td',class_='titleColumn').span.text.strip('()')
AttributeError Traceback(最近一次调用)& lt; ipython -输入- 9 - 2363 bafd916b>在& lt; module>——比;1 title = movies.find('td',class_='titleColumn').a.text2标题
~anaconda3libsite-packagesbs4element.py ingetattr(自我,键)2287 defgetattr(自我,键):引发一个有用的异常来解释一个常见的代码修复。"→2289 raise AttributeError(2290 &;ResultSet对象没有属性'% 5 '。您可能将元素列表视为单个元素。当你想调用find()时,你调用了find_all()吗?%的关键2291年 )
AttributeError: ResultSet对象没有属性'find'。您可能将元素列表视为单个元素。当您打算调用find()时,是否调用了find_all() ?
有人能帮我解决这个问题吗?提前感谢!要获得ResultSets作为列表,您可以尝试下面的示例:
from bs4 import BeautifulSoup
import requests
import pandas as pd
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.select('.chart.full-width tbody tr'):
data.append({
"title": card.select_one('.titleColumn a').get_text(strip=True),
"year": card.select_one('.titleColumn span').text,
'rating': card.select_one('td[class="ratingColumn imdbRating"]').get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)
#df.to_csv('out.csv', index=False)
输出:
title year rating
0 Avatar: The Way of Water (2022) 7.9
1 Glass Onion (2022) 7.2
2 The Menu (2022) 7.3
3 White Noise (2022) 5.8
4 The Pale Blue Eye (2022) 6.7
.. ... ... ...
95 Zoolander (2001) 6.5
96 Once Upon a Time in Hollywood (2019) 7.6
97 The Lord of the Rings: The Fellowship of the Ring (2001) 8.8
98 New Year's Eve (2011) 5.6
99 Spider-Man: No Way Home (2021) 8.2
[100 rows x 3 columns]
更新:采用find_all and find
方法提取数据。
from bs4 import BeautifulSoup
import requests
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0'}
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.table.tbody.find_all("tr"):
data.append({
"title": card.find("td",class_="titleColumn").a.get_text(strip=True),
"year": card.find("td",class_="titleColumn").span.get_text(strip=True),
'rating': card.find('td',class_="ratingColumn imdbRating").get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
find_all
返回一个数组,表示movies
是一个数组。您需要使用for movie in movies:
for movie in movies:
title = movie.find('td',class_='titleColumn').a.text
rating = movie.find('td',class_='ratingColumn imdbRating').strong.text
year = movie.find('td',class_='titleColumn').span.text.strip('()')