Python多处理:在映射过程中减少?

在使用Python的pool.map时，有没有办法减少内存消耗？

举个简短的例子：worker()做了一些繁重的工作并返回了一个更大的数组......

def worker():
# cpu time intensive tasks
return large_array

。和池映射一些大序列：

with mp.Pool(mp.cpu_count()) as p:
result = p.map(worker, large_sequence)

考虑到这种设置，显然，result将分配系统内存的很大一部分。但是，对结果的最终操作是：

final_result = np.sum(result, axis=0)

因此，NumPy实际上除了对可迭代对象进行求和运算进行求和运算外，什么都不做：

final_result = reduce(lambda x, y: x + y, result)

当然，这将使得在pool.map结果进来时消耗它们成为可能，并在减少后对其进行垃圾收集，以消除首先存储所有值的需要。

我现在可以写一些mp.queue结果进入，然后编写一些消耗队列的工作来总结结果，但这 (1( 需要更多的代码行和 (2( 对我来说感觉像是一个(可能更慢的(黑客而不是干净的代码。

有没有办法减少mp.Pool操作返回的结果？

迭代器映射器imap，imap_unordered似乎可以解决问题：

#!/usr/bin/env python3
import multiprocessing
import numpy as np
def worker( a ):
# cpu time intensive tasks
large_array = np.ones((20,30))+a
return large_array

if __name__ == '__main__':

arraysum = np.zeros((20,30))
large_sequence = range(20)
num_cpus = multiprocessing.cpu_count()    

with multiprocessing.Pool( processes=num_cpus ) as p:
for large_array in p.imap_unordered( worker, large_sequence ):
arraysum += large_array

相关内容

最新更新

热门标签：