for循环中的多处理

我有下面的带有for循环的matching()函数，我将向其传递一个大的generator(unique_combinations)。

它需要几天的处理时间，所以我想对循环中的元素使用多处理来加快速度，但我就是不知道如何做到

我发现一般来说很难理解concurrent.futures背后的逻辑。

results = []
match_score = []
def matching():    
for pair in unique_combinations:        
if fuzz.ratio(pair[0], pair[1]) > 90:    
results.append(pair)    
match_score.append(fuzz.ratio(pair[0], pair[1]))
def main():    
executor = ProcessPoolExecutor(max_workers=3)    
task1 = executor.submit(matching)    
task2 = executor.submit(matching)    
task3 = executor.submit(matching)
if __name__ == '__main__':
main()
print(results)
print(match_score)

我认为这应该会加快执行速度。

如果你已经在使用concurrent.futures，最好的方法IMO是使用map：

import concurrent.futures
def matching(pair):
fuzz_ratio = fuzz.ratio(pair[0], pair[1])  # only calculate this once
if fuzz_ratio  > 90:    
return pair, fuzz_ratio
else:
return None

def main():
unique_combinations = [(1, 2), (2, 3), (3, 4)]
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
for result in executor.map(matching, unique_combinations, chunksize=100):
if result:
# handle the results somehow
results.append(result[0])
match_score.append(results[1])

if __name__ == '__main__':
main()

有很多方法可以处理结果，但要点是从matching返回一个值，然后在executor.map中检索main中的循环。此处提供文档。

相关内容

最新更新

热门标签：