是否可以采用诸如work
之类的阻塞函数,并使其在具有多个工作线程的ProcessPoolExecutor
中并发运行?
import asyncio
from time import sleep, time
from concurrent.futures import ProcessPoolExecutor
num_jobs = 4
queue = asyncio.Queue()
executor = ProcessPoolExecutor(max_workers=num_jobs)
loop = asyncio.get_event_loop()
def work():
sleep(1)
async def producer():
for i in range(num_jobs):
results = await loop.run_in_executor(executor, work)
await queue.put(results)
async def consumer():
completed = 0
while completed < num_jobs:
job = await queue.get()
completed += 1
s = time()
loop.run_until_complete(asyncio.gather(producer(), consumer()))
print("duration", time() - s)
在具有 4 个以上内核的计算机上运行上述操作需要 ~4 秒。您将如何编写producer
以使上述示例仅花费~1秒?
await loop.run_in_executor(executor, work)
阻塞循环,直到work
完成,因此您一次只能运行一个函数。
要并发运行作业,您可以使用asyncio.as_completed
:
async def producer():
tasks = [loop.run_in_executor(executor, work) for _ in range(num_jobs)]
for f in asyncio.as_completed(tasks, loop=loop):
results = await f
await queue.put(results)
问题出在producer
.它不允许作业在后台运行,而是等待每个作业完成,从而序列化它们。如果您将producer
重写为如下所示(并保持consumer
不变(,您将获得预期的 1 秒持续时间:
async def producer():
for i in range(num_jobs):
fut = loop.run_in_executor(executor, work)
fut.add_done_callback(lambda f: queue.put_nowait(f.result()))