用Asyncio/AioHTTP获取多个URL并重试故障

我正在尝试使用AIOHTTP软件包编写一些异步获取请求，并且大部分内容都弄清楚了，但是想知道在处理故障时的标准方法是什么(返回为例外(。

到目前为止

import asyncio
import aiofiles
import aiohttp
from pathlib import Path
with open('urls.txt', 'r') as f:
    urls = [s.rstrip() for s in f.readlines()]
async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        data = await response.text()
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)
async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        results = await asyncio.gather(*[fetch(session, url) for url in urls],
                return_exceptions=True)
        return results
if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(urls, loop))

现在运行良好：

正如预期的那样，results变量填充了None条目，其中相应的URL [即在urls数组变量中的相同索引上，即在输入文件urls.txt中的同一行号上成功请求，并将相应的文件写入磁盘。
这意味着我可以使用结果变量来确定哪些URL未成功(results中的那些条目不等于None(

我研究了一些不同的指南，用于使用各种异步Python软件包(aiohttp，aiofiles和asyncio(，但我还没有看到处理这一最后一步的标准方法。

在await语句"完成"/"已完成"之后，是否应该重试发送GET请求？
...或应该在失败时通过某种回调来重试发送Get请求
- 错误看起来像这样： (ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)")，即端口443的IP地址000.XXX.XXX.XXX的请求失败，可能是因为服务器有一定的限制，我应该通过等待时间来尊重这些限制。

我可能会考虑某种限制，以批量批量请求而不是尝试所有请求？

在我的列表中尝试数百个(超过500(URL时，我会得到大约40-60个成功的请求。

天真地，我期望run_until_complete以这样的方式处理它，以至于成功地要求所有URL，但事实并非如此。

我以前从未与异步Python和会话/循环合作，因此感谢您找到如何获得results的任何帮助。请让我知道我是否可以提供更多信息来改进这个问题，谢谢！

在等待声明"完成"/"已完成"之后，是否应该重试发送GET请求？...或者应该在失败后通过某种回调启动发送Get请求的重试

您可以做前者。您不需要任何特殊的回调，因为您要在Coroutine内部执行，因此简单的while循环就足够了，并且不会干扰执行其他Coroutines。例如：

async def fetch(session, url):
    data = None
    while data is None:
        try:
            async with session.get(url) as response:
                response.raise_for_status()
                data = await response.text()
        except aiohttp.ClientError:
            # sleep a little and try again
            await asyncio.sleep(1)
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

天真地，我期望run_until_complete以这样的方式处理它，以至于成功地要求所有URL

术语"完整"是指完成(运行其课程(的技术意义，这是由Coroutine返回或提出例外来实现的。

相关内容

最新更新

热门标签：