Python asyncio/aiohttp:valueError:Windows上的Select()中的文件描述符太多

注意：未来的读者要知道，这个问题是旧的，格式的，并在匆忙中编程。给出的答案可能很有用，但是问题和代码可能却没有。

大家好，

我很难理解Asyncio和AioHTTP，并使两者一起工作。因为我不明白我在做什么，所以我遇到了一个我不知道如何解决的问题。

我正在使用Windows 10 64位。

以下代码返回不包含" html"的页面列表在内容类型的标题中。它是使用Asyncio实施的。

import asyncio
import aiohttp
MAXitems = 30
async def getHeaders(url, session, sema):
    async with session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False

def check_urls_without_html(list_of_urls):
    headers_without_html = set()
    while(len(list_of_urls) != 0):
        blockurls = []
        print(len(list_of_urls))
        items = 0
        for num in range(0, len(list_of_urls)):
            if num < MAXitems:
                blockurls.append(list_of_urls[num - items])
                list_of_urls.remove(list_of_urls[num - items])
                items += 1
        loop = asyncio.get_event_loop()
        semaphoreHeaders = asyncio.Semaphore(50)
        session = aiohttp.ClientSession()
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if not header[1]:
                headers_without_html.add(header)
    return headers_without_html

list_of_urls= ['http://www.google.com', 'http://www.reddit.com']
headers_without_html =  check_urls_without_html(list_of_urls)
for header in headers_without_html:
    print(header[0])

当我用太多URL运行时（即2000），有时会像这样返回错误：

data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
  File "USERAppDataLocalProgramsPythonPython36-32libasynciobase_events.py", line 454, in run_until_complete
    self.run_forever()
  File "USERAppDataLocalProgramsPythonPython36-32libasynciobase_events.py", line 421, in run_forever
    self._run_once()
  File "USERAppDataLocalProgramsPythonPython36-32libasynciobase_events.py", line 1390, in _run_once
    event_list = self._selector.select(timeout)
  File "USERAppDataLocalProgramsPythonPython36-32libselectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "USERAppDataLocalProgramsPythonPython36-32libselectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()

我已经阅读了这个问题是由于Windows的限制而引起的。我还读过，除了尝试使用较少的文件描述符外，还没有做太多事情。

我已经看到人们用Asyncio和AioHTTP推出了数千个请求，但即使使用我的Chuncking，我也无法在没有遇到此错误的情况下推30-50。

我的代码从根本上有问题，还是Windows固有的问题？可以修复吗？一个人可以在选择？

中增加允许文件描述符的最大限制

默认情况下，Windows只能在Asyncio循环中使用64个插座。这是基础选择（）API调用的限制。

要增加限制，请使用ProactorEventLoop，您可以使用以下代码。在这里查看完整的文档。

if sys.platform == 'win32':
    loop = asyncio.ProactorEventLoop()
    asyncio.set_event_loop(loop)

另一个解决方案是使用SEMPahore限制整体并发，请参见此处提供的答案。例如，在进行2000个API呼叫时，您可能希望不想要太多并行的打开请求（它们可能超时/更难看到单个呼叫时间）。这会给你

await gather_with_concurrency(100, *my_coroutines)

我有同样的问题。不能100％确保这可以正常工作，但请尝试替换这一点：

session = aiohttp.ClientSession()

与此：

connector = aiohttp.TCPConnector(limit=60)
session = aiohttp.ClientSession(connector=connector)

默认情况下，limit设置为100（文档），这意味着客户端可以一次打开100个同时连接。正如安德鲁（Andrew）提到的那样，Windows一次只能打开64个插座，因此我们提供的数字低于64个。

#Add to call area
loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)

相关内容

最新更新

热门标签：