我编写了以下函数,并在python shell中进行了测试,图像下载成功,但当我在脚本中运行它时,没有下载图像。
import os
import requests
from time import time
import uuid
from multiprocessing.pool import ThreadPool
main_file_name = 'test1.csv'
my_set = set()
with open(main_file_name, 'r') as f: #read image urls
for row in f:
my_set.add(row.split(',')[2].strip())
def get_url(entry):
path = str(uuid.uuid4()) + ".jpg"
if not os.path.exists(path):
r = requests.get(entry, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
start = time()
results = ThreadPool(8).imap_unordered(get_url, my_set)
print(f"Elapsed Time: {time() - start}")
我仔细检查了一下,它在shell中工作,我在脚本中缺少什么吗
"结果";属于multiprocessing.pool.IMapUnorderedIterator
类,确保URL下载的一个好方法是在results
上实际循环
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
for _ in results:
pass
print(f"Elapsed Time: {time() - start}")
另一种同样可以做到这一点的方法是确保主线程在退出脚本之前完成,即使用time.sleep
from time import sleep
start = time()
results = ThreadPool(8).imap_unordered(fetch_url, my_set)
sleep(10) # make sure this amount is enough to finish downloading
print(f"Elapsed Time: {time() - start}")
你的脚本不起作用的原因是你在启动results
后立即结束了脚本。python3 -i test.py
(或者简单地将你的代码复制粘贴到shell中(之所以起作用,是因为脚本没有被杀死(主线程还活着(,所以图像有时间下载。