这是我的第一个Webscraping应用程序类型。
这是我的代码:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url= 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
#opening up connection, grabbing page
uClient = uReq(my_url)
#makes it a variablepage_html = uClient.read()
page_html = uClient.read()
#will close it
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each container in HTML
containers = page_soup.find("div",{"class":"item-container"})
filename = "Products.csv"
f = open(filename, "w")
headers = "brand, product_name, shippingn"
f.write(headers)
for container in containers:
brand = containers.div.div.a["title"]
title_container = containers.find("a", {"class": "item-title"})
product_name = title_container[0].txt
shipping_container = container.find("li", {"class": "price-ship"})
shipping = shipping_container[0].txt.strip()
print("brand: " + brand)
print("product_name: " + product_name)
print("shipping: " + shipping)
f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "n")
f.close()
这是错误:
Traceback (most recent call last):
File "<ipython-input-23-b9aa37e3923c>", line 1, in <module>
runfile('/Users/Mohit/Documents/Python/webscrape.py', wdir='/Users/Mohit/Documents/Python')
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/Mohit/Documents/Python/webscrape.py", line 38, in <module>
brand = containers.div.div.a["title"]
TypeError: 'NoneType' object is not subscriptable
基本上,我想要的是抓住页面上所有图形卡的品牌,产品名称和运输价格,然后将其格式化为CSV。
我认为该程序无法找到图像或应从何处导入数据。这是我的第一个Web剪刀项目,我将https://www.youtube.com/watch?v=XQgXKtPSzUI&t=800s
用作教程
似乎您正在访问某些变量的属性,而无需检查它们是否存在。例如,在此行中:(给出了您正在遇到的例外;但在代码中的其他行中也是如此...(
brand = containers.div.div.a["title"]
我建议一种更加谨慎的方法。例如,此幼稚代码:
if (containers is not None) and (containers.div is not None) and (containers.div.div is not None) and (containers.div.div.a is not None):
brand = containers.div.div.a["title"]
else:
brand = ""
如果您想进一步调试问题的问题,请尝试嵌套条件:
if containers is not None:
if containers.div is not None:
# ... more conditions here ...
else:
print "ERROR 2: containers.div was None! :("
else:
print "ERROR 1: containers was None! :("