并发期货在异常后不继续

我使用并发期货从网页中抓取h1列表，并将它们附加到一个名为archive_h1_list的列表中。问题是，一旦并发期货遇到异常，它就会停止追加列表。

当我在下面打印结果列表时，它在第一个异常之后停止。['Example Domain', 'Example Domain', 'Exception Error!']在遇到异常后，它永远不会继续处理列表中的最后一个https://www.example.com h1。

import concurrent.futures
from urllib.request import urlopen
from bs4 import BeautifulSoup
CONNECTIONS = 8
archive_url_list = ["https://www.example.com", "https://www.example.com", "sdfihaslkhasd", "https://www.example.com"]
archive_h1_list = []
def get_archive_h1(h1_url):
html = urlopen(h1_url)
bsh = BeautifulSoup(html.read(), 'lxml')
return bsh.h1.text.strip()

def concurrent_calls():
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
f1 = executor.map(get_archive_h1, archive_url_list)
try:
for future in f1:
archive_h1_list.append(future)
except Exception:
archive_h1_list.append("Exception Error!")
pass

期望的输出应该是:

['Example Domain', 'Example Domain', 'Exception Error!', 'Example Domain']

这是因为您的for循环在try内部，当您捕获异常时，try块被挂起，except块正在执行，因此您的for循环被中断。

解决这个问题的一种方法是将for循环移出try块，但是根据Executor.map的文档:

如果函数调用引发异常，则在从迭代器中检索其值时将引发该异常。

这使得异常处理在函数之外非常糟糕。

所以第一个解决方案是捕获get_archive_h1中的异常:

def get_archive_h1(h1_url):
try:
html = urlopen(h1_url)
bsh = BeautifulSoup(html.read(), 'lxml')
return bsh.h1.text.strip()
except Exception:
return "Exception Error!"

def concurrent_calls():
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
f1 = executor.map(get_archive_h1, archive_url_list)
for future in f1:
archive_h1_list.append(future)

另一种解决方案是使用不同的执行者方法，这样您可以更好地控制未来的分辨率，即Executor.submit:

def concurrent_calls():
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
futures = [executor.submit(get_archive_h1, url) for url in archive_url_list]
for future in futures:
try:
archive_h1_list.append(future.result())
except Exception:
archive_h1_list.append("Exception Error!")
pass

相关内容

最新更新

热门标签：