Python 网页抓取 - 如何在出现错误时打印"end of list" - Python webscraping - how to print "end of list" in case of error 小贝子编程网

我想从维基百科页面上提取总统的列表。代码可以很好地做到这一点；然而，在列出名单并删除拜登之后，我得到了以下错误代码，因为没有其他名字可以删除。有人知道一种方法吗？一旦它识别出没有其他名字可以提取，而不是错误，我就可以让它打印"列表末尾"？谢谢

追踪(最近一次通话(：文件"；文件路径\WebcrawingMod6.py"；，第11行，inprint(name.get_text('title'((AttributeError:"NoneType"对象没有属性"get_text">

import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tb = soup.find('table', class_='wikitable')
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))

我建议查找try-and-except语句！

import requests
from bs4 import BeautifulSoup
try:
url =     "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tb = soup.find('table', class_='wikitable')
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))
except AttributeError:
print(“End of file.”)

在您正在抓取的页面上，有一个b标记，其值为'Sources:'。它没有一个子标记。您没有在代码中说明这种情况。

我建议：

import requests
from bs4 import BeautifulSoup as BS
(r := requests.get('https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States')).raise_for_status()
soup = BS(r.text, 'lxml')
for b in soup.find('table', class_='wikitable sortable').find_all('b'):
if (ba := b('a')):
print(ba[0].text)

Python 网页抓取 - 如何在出现错误时打印"end of list"

相关内容

最新更新

热门标签：