将函数并行化以写入全局变量

我正在努力找出并行化这样的程序的最佳方法：

global_data = some data
global_data2 = some data
data_store1 = np.empty(n)
data_store2 = np.empty(n)
.
.
.
def simulation(global_data):

retrieve values from global datasets and set element of global datastores

以便我执行类似于将list(enumerate(global_data))传递给多处理函数的操作，并且每个进程设置与接收到的(index，vlaue(对相对应的全局数据存储的元素。我在一个有128个内核的高性能集群上运行，所以我认为并行化比线程化更可取。

如果将多处理池(例如multiprocessing.Pool实例(与其map方法一起使用，则辅助函数simulation只需要将其结果返回给主进程，主进程将以正确顺序显示结果列表。这将比使用托管列表成本更低，worker函数将其结果添加到该列表中：

import multiprocessing
def simulation(global_data_elem):
# We are passed a single element of global_data
# Do calculation with global_data_elem and return result.
# The CPU resources required to do this calculation must be sufficiently
# high to justify the additional overhead of multiprocessing (which is
# not the case for this demo):
return global_data_elem * 2

def main():
# global_data is some data (not necesarilly at global scope)
global_data = ['a', 'b', 'c']
# create pool of the correct size
# (not larger than the number of cores we have nor the number of tasks being submitted):
pool = multiprocessing.Pool(min(len(global_data), multiprocessing.cpu_count()))
# results are returned in the correct order (task submission order):
results = pool.map(simulation, global_data)
print(results)
# Required for Windows:
if __name__ == '__main__':
main()

打印：

['aa', 'bb', 'cc']

相关内容

最新更新

热门标签：