我要做的是检查哪种多处理最适合我的数据。我试图多处理此循环:
def __pure_calc(args):
j = args[0]
point_array = args[1]
empty = args[2]
tree = args[3]
for i in j:
p = tree.query(i)
euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2))
##add one row at a time to empty list
empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]])
return empty
只是纯函数 6.52秒
我的第一种方法是多处理。映射:
from multiprocessing import Pool
def __multiprocess(las_point_array, point_array, empty, tree):
pool = Pool(os.cpu_count())
for j in las_point_array:
args=[j, point_array, empty, tree]
results = pool.map(__pure_calc, args)
#close the pool and wait for the work to finish
pool.close()
pool.join()
return results
当我检查其他答案如何多流程函数时,应该很容易:映射(呼叫函数,输入( - 完成。但是,由于某种原因,我的多流行器并不是除了我的输入外,scipy.spatial.ckdtree.ckdtree对象的上升错误是无法订阅的。
所以我尝试使用apply_async:
from multiprocessing.pool import ThreadPool
def __multiprocess(arSegment, wires_point_array, ptList, tree):
pool = ThreadPool(os.cpu_count())
args=[arSegment, point_array, empty, tree]
result = pool.apply_async(__pure_calc, [args])
results = result.get()
它没有问题。对于我的测试数据,我设法在 6.42 sec中进行计算。
为什么apply_async接受ckdtree,毫无问题和池。我需要更改以使其运行?
pool.map(function, iterable)
,它与itertool的map
基本上具有相同的占地面积。来自ITABLE的每个项目都是__pure_calc
函数的args
。
在这种情况下,我想您可能会更改为:
def __multiprocess(las_point_array, point_array, empty, tree):
pool = Pool(os.cpu_count())
args_list = [
[j, point_array, empty, tree]
for j in las_point_array
]
results = pool.map(__pure_calc, args_list)
#close the pool and wait for the work to finish
pool.close()
pool.join()
return results