当某些页面不存在href时，在网页中循环

我正在从网上收集评论。有些产品有多页评论；其他人只有一页。在这里一些人的帮助下，我写了一个代码，基本上可以让scraper在有"下一页"链接的时候点击它。

我的问题是，当只有一页评论时，没有可以点击的链接，刮刀一直在等待。我想让程序看看下一页链接是否存在：如果存在，请单击它，如果不存在，请返回循环顶部。

这是我的代码：

for url in list_urls:
  while True:
    raw_html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(raw_html)
#See if the "next page" link exists: if it does not, go back to the top of the loop
    href_test = soup.find('div', id='company_reviews_pagination')
    if href_test == None:
       break
#If next-page link exists, click on it
    elif href_test != None:
       last_link = soup.find('div',id='company_reviews_pagination').find_all('a')[-1]
       if last_link.text.startswith('Next'):
          next_url_parts = urllib.parse.urlparse(last_link['href'])
          url = urllib.parse.urlunparse(#code to define the "next-page" url - that part works!)
       else:
          break

到目前为止，它没有给我错误，但程序没有运行，它一直在等待。我做错了什么？我是否应该尝试使用"try"语句来专门处理此异常？

非常感谢。非常感谢任何指导。

下面是我修复它的方法。我没有玩"如果链接存在条件"，而是使用try/except:

    try:
       last_link = soup.find('div', id='company_reviews_pagination').find_all('a')[-1]
       if last_link.text.startswith('Next'):
         next_url_parts = urllib.parse.urlparse(last_link['href'])
         url = urllib.parse.urlunparse(#code to find the next-page link )
       else:
         break
    except :
       break

相关内容

最新更新

热门标签：