非类型对象在抓取数据时没有属性文本错误



当我试图从这个amazon链接中删除数据时。我得到了AttributeError: 'NoneType' object has no attribute 'text'

我的代码:

headers = ({'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0',
'Accept-Language' : 'en-US,en;q=0.5'})
lap_site = requests.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent',headers = headers)
lap_soup = bs(lap_site.content,'lxml')
content = lap_soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
lap_name = lap_price = lap_rating = []
for i in lap_detail_block:
laptop_name = i.find('h2').a.span.text
lap_name.append(laptop_name)
laptop_rating = i.find('span',class_ = 'a-icon-alt').text
lap_rating.append(laptop_rating)
laptop_price = i.find('span',class_ = 'a-price-whole').text   
lap_price.append(laptop_price)
laptop_details = {
'Laptop':lap_name,
'Price':lap_price,
'Rating':lap_rating }
print(laptop_details)

我认为laptop_rating变量以字符串格式存储内容,即使我们不包括.text。我认为这可能是NoneType错误的原因,因为我们正在从文本中提取文本。不管怎样,这不是问题所在。如何从该链接中提取价格或评级?

至少从我的测试来看,该页面正在识别自动访问并阻止它。你需要使用类似cloudscraper的东西来完成。以下代码将返回预期结果(根据你自己的情况进行调整(:

import cloudscraper
import pandas as pd
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper()
r = scraper.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent')
soup = BeautifulSoup(r.content, 'html.parser')
# print(soup)
content = soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
lap_name = lap_price = lap_rating = []
for i in lap_detail_block:
try:
laptop_name = i.find('h2').a.span.text
lap_name.append(laptop_name)
laptop_rating = i.find('span',class_ = 'a-icon-alt').text
lap_rating.append(laptop_rating)
laptop_price = i.find('span',class_ = 'a-price-whole').text   
lap_price.append(laptop_price)
laptop_details = {
'Laptop':lap_name,
'Price':lap_price,
'Rating':lap_rating 
}
print(laptop_name, laptop_rating, laptop_price)
except Exception as e:
print(e)
print('_____________')

这将在终端中打印出来:

HP 15s, 12th Gen Intel Core i5 8GB RAM/512GB SSD 15.6-inch(39.6 cm) FHD,Micro-Edge, Anti- Glare Display/Win 11/Intel Iris Xe Graphics/Dual Speakers/Alexa/Backlit KB/MSO/Fast Charge, 15s- fq5111TU 4.2 out of 5 stars 58,699
_____________
Acer Predator Helios 500 Gaming Laptop (11Th Gen Intel Core I9/17.3 Inches 4K Uhd Display/64Gb Ddr4 Ram/2Tb Ssd/1Tb HDD/RTX 3080 Graphics/Windows 10 Home/Per Key RGB Backlit Keyboard) Ph517-52 3.0 out of 5 stars 3,79,990
_____________
ASUS VivoBook 14 (2021), 14-inch (35.56 cm) HD, Intel Core i3-1005G1 10th Gen, Thin and Light Laptop (8GB/1TB HDD/Windows 11/Integrated Graphics/Grey/1.6 kg), X415JA-BV301W 3.8 out of 5 stars 27,990
_____________
[...]

Cloudscraper的详细信息和安装说明:https://pypi.org/project/cloudscraper/

最新更新