针对子集的连接组件标记区域优化python



我有一个二进制映射,我在上面进行连接组件标记,并为64x64网格获得类似的东西-http://pastebin.com/bauas0NJ

现在我想按标签将它们分组,这样我就可以找到它们的区域和质心

#ccl_np is the computed array from the previous step (see pastebin)
#I discard the label '1' as its the background
unique, count = np.unique(ccl_np, return_counts = True)
xcm_array = []
ycm_array = []
for i in range(1,len(unique)):
    subarray = np.where(ccl_np == unique[i])
    xcm_array.append("{0:.5f}".format((sum(subarray[0]))/(count[i]*1.)))
    ycm_array.append("{0:.5f}".format((sum(subarray[1]))/(count[i]*1.)))
final_array = zip(xcm_array,ycm_array,count[1:])

我想要一个快速代码(因为我将为4096x4096大小的网格执行此操作),并被告知要查看numba。这是我天真的尝试:

unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
@numba.autojit
def mysolver(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i][j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

mysolver(xcm_array, ycm_array, inverse, count)
final_array = zip(xcm_array,ycm_array,count)

令我惊讶的是,使用numba的速度比以前慢,或者充其量等于以前的速度。我做错了什么?此外,这可以在Cython中完成吗?速度会更快吗?

我正在使用最新的Anaconda python 2.7发行版中包含的软件包。

我认为问题可能是您对jit'd代码的计时不正确。第一次运行代码时,时间包括numba编译代码所需的时间。这叫做预热jit。如果你再打一次电话,这笔费用就没有了。

import numpy as np
import numba as nb
unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
def mysolver(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i][j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)
@nb.jit(nopython=True)
def mysolver_nb(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i,j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

然后用多次运行代码的CCD_ 1进行计时。首先是纯python版本:

In [4]:%timeit mysolver(xcm_array, ycm_array, inverse, count)
10 loops, best of 3: 25.8 ms per loop

然后用numba:

In [5]: %timeit mysolver_nb(xcm_array, ycm_array, inverse, count)
The slowest run took 3630.44 times longer than the fastest. This could mean         that an intermediate result is being cached 
10000 loops, best of 3: 33.1 µs per loop

numba代码的速度大约快1000倍。

最新更新