我正在编写一个简单的脚本来对图像数据集进行一些预处理,其中包括调整大小和添加过滤器。
这是我的代码:
def preprocessing(tar_ratio, img_paths, label_paths,
save_dir="output", resampling_mode=None):
# with concurrent.futures.ThreadPoolExecutor() as executor:
with concurrent.futures.ProcessPoolExecutor() as executor:
for img_path, label_path in zip(img_paths, label_paths):
src_ratio = get_ratio(label_path)
if src_ratio is not np.nan:
executor.submit(
process_single(src_ratio, tar_ratio, img_path, label_path,
save_dir=save_dir, resampling_mode=resampling_mode)
)
else:
pass
我认为它更受CPU限制,所以multiprocessing
比multithreading
更合适。但在尝试了这两种方法后,在只使用两个CPU内核的情况下,两者都没有像预期的那样工作。
我读过下面的帖子,我想知道是否有使用concurrent.futures
的更新版本?如何利用python多处理的所有核心
Executor.submit方法接受一个可调用的作为第一个参数,但您调用了该函数,请尝试以下操作:
executor.submit(
process_single,
src_ratio,
tar_ratio,
img_path,
label_path,
save_dir=save_dir,
resampling_mode=resampling_mode,
)
一个简单的例子说明了正确的用法:
测试.py:
import random
import time
from concurrent.futures import ProcessPoolExecutor
def worker(i):
t = random.uniform(1, 5)
print(f"START: {i} ({t:.2f}s)")
time.sleep(t)
print(f"END: {i}")
return i * 2
def main():
futures = []
with ProcessPoolExecutor() as executor:
for i in range(5):
futures.append(executor.submit(worker, i))
print([f.result() for f in futures])
if __name__ == "__main__":
main()
示例:
$ python test.py
START: 0 (3.16s)
START: 1 (1.68s)
START: 2 (2.76s)
START: 3 (1.53s)
START: 4 (4.05s)
END: 3
END: 1
END: 2
END: 0
END: 4
[0, 2, 4, 6, 8]