Webscrape应用找不到正确的HTML容器



这是我的第一个Webscraping应用程序类型。

这是我的代码:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url= 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
#opening up connection, grabbing page
uClient = uReq(my_url)
#makes it a variablepage_html = uClient.read()
page_html = uClient.read()
#will close it
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each container in HTML
containers = page_soup.find("div",{"class":"item-container"})
filename = "Products.csv"
f = open(filename, "w")
headers = "brand, product_name, shippingn"
f.write(headers)
for container in containers:
    brand = containers.div.div.a["title"]
    title_container = containers.find("a", {"class": "item-title"})
    product_name = title_container[0].txt
    shipping_container = container.find("li", {"class": "price-ship"})
    shipping = shipping_container[0].txt.strip()
    print("brand: " + brand)
    print("product_name: " + product_name)
    print("shipping: " + shipping)
    f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "n")
f.close()

这是错误:

Traceback (most recent call last):
  File "<ipython-input-23-b9aa37e3923c>", line 1, in <module>
    runfile('/Users/Mohit/Documents/Python/webscrape.py', wdir='/Users/Mohit/Documents/Python')
  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)
  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "/Users/Mohit/Documents/Python/webscrape.py", line 38, in <module>
    brand = containers.div.div.a["title"]
TypeError: 'NoneType' object is not subscriptable

基本上,我想要的是抓住页面上所有图形卡的品牌,产品名称和运输价格,然后将其格式化为CSV。

我认为该程序无法找到图像或应从何处导入数据。这是我的第一个Web剪刀项目,我将https://www.youtube.com/watch?v=XQgXKtPSzUI&t=800s用作教程

似乎您正在访问某些变量的属性,而无需检查它们是否存在。例如,在此行中:(给出了您正在遇到的例外;但在代码中的其他行中也是如此...(

brand = containers.div.div.a["title"]

我建议一种更加谨慎的方法。例如,此幼稚代码:

if (containers is not None) and (containers.div is not None) and (containers.div.div is not None) and (containers.div.div.a is not None):
  brand = containers.div.div.a["title"]
else:
  brand = ""

如果您想进一步调试问题的问题,请尝试嵌套条件:

if containers is not None:
  if containers.div is not None:
    # ... more conditions here ...
  else:
    print "ERROR 2: containers.div was None! :("
else:
  print "ERROR 1: containers was None! :("

最新更新