如何限制aiofiles的并发读/写数量?



我的程序会同时使用 aiohttp 下载大约 1000 万条数据,然后将数据写入磁盘上的大约 4000 个文件。

我使用 aiofiles 库是因为我希望我的程序在读取/写入文件时也执行其他操作。

但我担心,如果程序尝试同时写入所有 4000 个文件,硬盘将无法快速完成所有写入。

是否可以限制 aiofiles(或其他库(的并发写入次数?aiofiles已经这样做了吗?

谢谢。

测试代码:

import aiofiles
import asyncio

async def write_to_disk(fname):
async with aiofiles.open(fname, "w+") as f:
await f.write("asdf")

async def main():
tasks = [asyncio.create_task(write_to_disk("%d.txt" % i)) 
for i in range(10)]
await asyncio.gather(*tasks)

asyncio.run(main())

您可以使用asyncio.Semaphore来限制并发任务的数量。只需强制write_to_disk函数在写入之前获取信号量:

import aiofiles
import asyncio

async def write_to_disk(fname, sema):
# Edit to address comment: acquire semaphore after opening file
async with aiofiles.open(fname, "w+") as f, sema:
print("Writing", fname)
await f.write("asdf")
print("Done writing", fname)

async def main():
sema = asyncio.Semaphore(3)  # Allow 3 concurrent writers
tasks = [asyncio.create_task(write_to_disk("%d.txt" % i, sema)) for i in range(10)]
await asyncio.gather(*tasks)

asyncio.run(main())

请注意sema = asyncio.Semaphore(3)行以及在async with中添加sema,

输出:

"""
Writing 1.txt
Writing 0.txt
Writing 2.txt
Done writing 1.txt
Done writing 0.txt
Done writing 2.txt
Writing 3.txt
Writing 4.txt
Writing 5.txt
Done writing 3.txt
Done writing 4.txt
Done writing 5.txt
Writing 6.txt
Writing 7.txt
Writing 8.txt
Done writing 6.txt
Done writing 7.txt
Done writing 8.txt
Writing 9.txt
Done writing 9.txt
"""

最新更新