如果找不到我要查找的元素,如何处理 BeautifulSoup 中的异常?



我正在向一个网站发出http请求,并解析其内容以查找一些属性值。我需要知道的是,如果代码返回[]None或什么都不返回,我该如何处理异常。

我尝试过的:

import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
from bs4 import BeautifulSoup
def get_url():
s = requests.Session()
retries = Retry(total=5,
backoff_factor=10
status_forcelist=[ 500, 502, 503, 504 ])
s.mount('http://', HTTPAdapter(max_retries=retries))
r = s.get('http://httpstat.us/500')
def find_data():
soup = BeautifulSoup(r.text, "lxml")
try:
id = soup.find('a', class_="class").get('id')
except:
print('id not found')
get_url()

基本上,如果id找不到,我想再次发出GET请求,并尝试找到它。

您可以应用"先看后跳"(LBYL(原则并检查find()的结果-如果找不到元素,它将返回None。然后,你可以把东西放入循环中,当你有值时退出,也可以用循环计数器限制来保护自己:

RETRIES = 10
id = None
session = requests.Session()
for attempt in range(1, RETRIES + 1):
response = session.get(url)
soup = BeautifulSoup(r.text, "lxml")
element = soup.find('a', class_="class", id=True)
if element is None:
print("Attempt {attempt}. Element not found".format(attempt=attempt))
continue
else:
id = element["id"]
break
print(id)

情侣笔记:

  • id=True被设置为仅查找存在id元素的元素。您也可以使用CSS选择器soup.select_one("a.class[id]")执行等效操作
  • Session()有助于在多次向同一主机发出请求时提高性能。在会话对象中查看更多信息

如果你只想再次发出相同的请求,你可以这样做:

import requests
from bs4 import BeautifulSoup
def find_data(url):
found_data = False
while not found_data:
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
id = soup.find('a', class_="class").get('id')
found_data = True
except:
pass

如果数据真的不存在,这将使您面临无限循环的风险。你可以这样做来避免无限循环:

import requests
from bs4 import BeautifulSoup
def find_data(url, attempts_before_fail=3):
found_data = False
while not found_data:
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
id = soup.find('a', class_="class").get('id')
found_data = True
except:
attempts_before_fail -= 1
if attempts_before_fail == 0:
raise ValueError("couldn't find data after all.")

最新更新