网页抓取不起作用



所以我在寻找我最喜欢软件的东西。然后我发现了网络抓取,我发现它真的很神奇,所以我的python经验,我得到了一些动手在一些美丽的汤和请求,这里是代码

import html5lib
import requests
from bs4 import BeautifulSoup as BS
# Get all the a strings , next siblings and next siblings
def makeSoup(urls):
    url = requests.get(urls).text
    return BS(url,"html5lib")   
def something(soup):
    for anchor in soup.findAll("a",{"data-type":"externalLink"}):
        print(anchor.string)
        next_sibling = anchor.nextSibling
        water = str(next_sibling.string)
        water = water[0:5]
        while  water != "(202)":
            next_sibling = next_sibling.nextSibling
            if next_sibling == None:
                continue
            if next_sibling.string != None:
                print(next_sibling.string)
                water = str(next_sibling.string)
                water = water[0:5]
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide.htm")
something(soup)
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_2.htm")
something(soup)
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_3.htm")
something(soup)
<!-- begin snippet: js hide: false console: true babel: false -->

但遗憾的是,所有程序员都是错误的噩梦。
Traceback (most recent call last):
  File "C:UsersRajDesktopkunal projectsPythonlisting_out_all_embassies.py", line 26, in <module>
    something(soup)
  File "C:UsersRajDesktopkunal projectsPythonlisting_out_all_embassies.py", line 17, in something
    next_sibling = next_sibling.nextSibling
AttributeError: 'NoneType' object has no attribute 'nextSibling'

我做错了什么,我是一个新手编程以及网络抓取。那么,我没有遵循的一些好的做法是什么呢无论如何。谢谢你一直读到最后

您必须先检查next_sibling == None,然后才能使用next_sibling.nextSibling (break时是None)

def something(soup):
    for anchor in soup.findAll("a",{"data-type":"externalLink"}):
        print(anchor.string)
        next_sibling = anchor.nextSibling
        water = str(next_sibling.string)
        water = water[0:5]
        while  water != "(202)":
            if next_sibling == None:
                break
            next_sibling = next_sibling.nextSibling
            if next_sibling == None:
                break
            if next_sibling.string != None:
                print(next_sibling.string)
                water = str(next_sibling.string)
                water = water[0:5]

但是我可以写得更短

def something(soup):
    for anchor in soup.findAll("a",{"data-type":"externalLink"}):
        water = None  # create variable to use it first time in "while"
        while anchor and water != "(202)":
            if anchor.string:
                print(anchor.string)
                water = anchor.string[:5]
            anchor = anchor.nextSibling