Python 网页抓取 - 属性错误:'NoneType'对象没有属性'text'



我需要一些帮助,尝试用BeautifulSoup、Selenium和Pandas将Flipkart中的笔记本电脑价格、评级和产品网络抓取到CSV文件中。问题是我遇到了一个错误AttributeError:当我尝试将刮取的项目追加到空列表中时,"NoneType"对象没有属性"text">

from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup

chrome_option = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path = "C:/Users/folder/PycharmProjects/chromedriver.exe")
#flipkart website
driver.get("https://www.flipkart.com/laptops/~cs-g5q3mw47a4/pr?sid=6bo%2Cb5g&collection-tab-name=Browsing&wid=13.productCard.PMU_V2_7")

products = []
prices = []
ratings = []

content = driver.page_source
soup = BeautifulSoup(content, 'lxml')
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
name = item.find('div', attrs={'class' : '_4rR01T'})
price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
rating = item.find('div', attrs={'class' : '_3LWZlK'})

products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)

df = pd.DataFrame({'Product Name': products,
'Price': prices,
'Rating': ratings})
df.to_csv(r"C:UsersfolderDesktopwebscrape.csv", index=True, encoding= 'utf-8')

您应该使用.contents.get_text()而不是.text。此外,尽量关心NoneType:

products.append(name.get_text()) if name else ''
prices.append(price.get_text()) if price else ''
ratings.append(rating.get_text()) if ratings else ''

找到了解决方案!将.text替换为.get_text((后,错误得到了解决。此外,避免另一个错误的方法ValueError:数组必须都是相同的长度是打印(len(((以确认是否要传递到Pandas数据帧中的附加数据的长度。

在这种情况下,发现ratings变量在for循环的所有迭代中的len((为0,因此它们不包括在数据帧df中。以下是修改后的代码:

#--snip--

#empty list to be appended later with webscraped items
products = []
prices = []
ratings = []
for item in soup.findAll('a', href = True, attrs={'class' : '_1fQZEK'}):
name = item.find('div', attrs={'class' : '_4rR01T'})
price = item.find('div', attrs={'class' : '_30jeq3 _1_WHN1'})
rating = item.find('div', attrs={'class' : '_3LWZlK'})
#append the info to the empty lists
products.append(name.get_text()) if name else ''
prices.append(price.get_text()) if price else ''
#creating pandas DataFrame
print(f"Products: {len(products)}")
print(f"Prices: {len(prices)}")
print(f"Ratings: {len(ratings)}")
df = pd.DataFrame({'Product Name': products,
'Price': prices})
#sending the dataframe to csv
df.to_csv(r"C:UsersfolderDesktopsamplescrape.csv", index=True, encoding= 'utf-8')

最新更新