如何使用线程池更快地下载 1000+ 张图片? 因为下载这 1000+ 张图片需要太长时间,所以我目前的脚本
当前脚本
import request
image_url = [
“http://image_eg_001”,
“http://image_eg_002”,
“http://image_eg_003”,
]
for img in image_url:
file_name = img.split(‘/‘)[-1]
print(“Downloading File:%s”%file_name)
r = request.get(img, stream=True)
with open(file_name, ‘wb’) as f:
for chunk in r:
f.write(chunk)
您可以使用 AsyncIO 和 AIOHTTP 软件包来执行并发网络请求。一个潜在的解决方案是:
import asyncio
import aiohttp
async def download_image(image_url: str, save_path: str, session: aiohttp.ClientSession):
async with session.get(image_url) as response:
content = await response.read()
with open(save_path, "wb") as f:
f.write(content)
async def main():
image_urls = [...]
save_paths = [...]
async with aiohttp.ClientSession() as session:
await asyncio.gather(*[download_image(im, p, session) for im, p in zip(image_urls, save_paths)])
if __name__ == "__main__":
asyncio.run(main())
download_image()
功能负责下载和保存一张图像。
main()
函数使用asyncio.gather()
执行并发请求。
您可以使用concurrent.futures.ThreadPoolExecutor
类。我选择了 100 作为工作线程数,但您可以为您的系统更改它,它可以根据您的情况或多或少。如果下载需要很长时间,更多的工作线程可能会严重影响您的响应能力和系统资源。
这里是用于下载图像的线程池解决方案
import requests
from concurrent.futures import ThreadPoolExecutor
image_url = [
'http://image_eg_001',
'http://image_eg_002',
'http://image_eg_003',
]
def download(url):
r = requests.get(url, allow_redirects=False)
with open(url.split("/")[-1], "wb") as binary:
binary.write(r.content)
with ThreadPoolExecutor(max_workers=100) as executor:
executor.map(download,image_url)