通过目录树进行多处理未按预期工作。我正在尝试将所有iso
文件添加到单个set()
并仅输出该集。我知道我告诉 python 返回None
但我不知道如何在不返回None
的情况下做到这一点.如何从多处理输出奇异集?
import itertools
import multiprocessing
def worker(filename):
data_set = set()
if ".iso" in filename:
data_set.add(filename)
return data_set if len(data_set) != 0 else None
def search_for_iso(dirname=None, verbose=False, default_path="/"):
iso_found = set()
if dirname is None:
pool = multiprocessing.Pool(processes=48)
walker = os.walk(default_path)
file_data_gen = itertools.chain.from_iterable((
os.path.join(root, f) for f in files) for root, sub, files in walker)
results = pool.map(worker, file_data_gen)
return results
截至目前,它将输出以下内容:set(['/test.iso', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, .....]) # whole lot of None's
预期产出:set(['/test.iso'])
我通过运行处理结果并检查文件是否以.iso
结尾,找到了解决方案:
def worker(filename):
if filename.endswith(".iso"):
return filename
def search_for_iso(dirname=None, verbose=False, default_path="/"):
retval = set()
if dirname is None:
pool = multiprocessing.Pool(processes=48)
walker = os.walk(default_path)
file_data_gen = itertools.chain.from_iterable((
os.path.join(root, f) for f in files) for root, sub, files in walker)
results = pool.map(worker, file_data_gen)
for data in results:
if data is not None:
retval.add(data)
return retval
这似乎工作得很好,似乎根本没有减慢这个过程
首先,使用多个进程不会获得任何额外的性能,因为您仍然需要等待比 CPU 慢得多的文件系统。
至于你当前的代码,只要返回集合,即使它是空
的def worker(filename):
data_set = set()
if ".iso" in filename:
data_set.add(filename)
return data_set