Python多进程/螺纹循环

我要做的是检查哪种多处理最适合我的数据。我试图多处理此循环：

def __pure_calc(args):
    j = args[0]
    point_array = args[1]
    empty = args[2]
    tree = args[3] 
    for i in j:
            p = tree.query(i)   
            euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2))  
            ##add one row at a time to empty list
            empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]]) 
    return empty

只是纯函数 6.52秒

我的第一种方法是多处理。映射：

from multiprocessing import Pool 
def __multiprocess(las_point_array, point_array, empty, tree):
    pool = Pool(os.cpu_count()) 
    for j in las_point_array:
        args=[j, point_array, empty, tree]
        results = pool.map(__pure_calc, args)
    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 
    return results

当我检查其他答案如何多流程函数时，应该很容易：映射(呼叫函数，输入( - 完成。但是，由于某种原因，我的多流行器并不是除了我的输入外，scipy.spatial.ckdtree.ckdtree对象的上升错误是无法订阅的。

所以我尝试使用apply_async：

from multiprocessing.pool import ThreadPool
def __multiprocess(arSegment, wires_point_array, ptList, tree):
    pool = ThreadPool(os.cpu_count())
    args=[arSegment, point_array, empty, tree]
    result = pool.apply_async(__pure_calc, [args])
    results = result.get()

它没有问题。对于我的测试数据，我设法在 6.42 sec中进行计算。

为什么apply_async接受ckdtree，毫无问题和池。我需要更改以使其运行？

pool.map(function, iterable)，它与itertool的map基本上具有相同的占地面积。来自ITABLE的每个项目都是__pure_calc函数的args。

在这种情况下，我想您可能会更改为：

def __multiprocess(las_point_array, point_array, empty, tree):
    pool = Pool(os.cpu_count()) 
    args_list = [
        [j, point_array, empty, tree]
        for j in las_point_array
    ]
    results = pool.map(__pure_calc, args_list)
    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 
    return results

相关内容

最新更新

热门标签：