如何在Cython中动态声明2D c数组

我需要使用各种大小的2D numpy数组执行大量工作，我想将这些计算卸载到cython上。这个想法是，我的2D numpy数组将从python传递到cython，在那里它将被转换为c数组或内存视图，并在其他c级函数的级联中使用来进行计算。

经过一些分析后，由于一些严重的python开销，我排除了在cython中使用numpy数组的可能性。使用内存视图要快得多，也很容易使用，但我怀疑我可以从使用c数组中获得更多的加速。

不过，我的问题是——如何在cython中声明2D c数组，而不使用集合值预定义其维度？例如，我可以通过以下方式从numpy创建一个c数组：

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int c_arr[3][4]:
for i in range(3):
    for j in range(4):
        c_arr[i][j] = narr[i][j]

然后将其传递给一个函数：

cdef void somefunction(int c_Arr[3][4]):
    ...

但这意味着我有一个固定的数组，在我的情况下，这将是无用的。所以我尝试了这样的东西：

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int c_arr[a][b]:               # INCORRECT - EXAMPLE ONLY
for i in range(a):
    for j in range(b):
        c_arr[i][j] = narr[i][j]

目的是将其传递给这样的函数：

cdef void somefunction(int a, int b, int c_Arr[a][b]):
    ...

但它不起作用，编译失败，并出现错误"；常量表达式中不允许使用"；。我怀疑我不需要以某种方式使用malloc/free？我看了这个问题（如何在Cython中声明2D列表），但它并没有提供我问题的答案。

更新：

事实证明，如果确保对内存视图关闭cython中的indexError检查，那么内存视图可以像c数组一样快，这可以通过使用cython编译器指令来完成：

# cython: boundscheck=False

感谢@Veedrac的提示！

您只需要停止进行边界检查：

with cython.boundscheck(False):
    thesum += x_view[i,j]

这使速度基本上达到标准。

如果你真的想要一个C数组，试试：

import numpy as numpy
from numpy import int32
from numpy cimport int32_t
numpy_array = numpy.array([[]], dtype=int32)
cdef:
    int32_t[:, :] cython_view = numpy_array
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array

首先你得到一个Numpy数组。你用它来获取内存视图。然后，您得到一个指向其数据的指针，并将其投射到所需步幅的指针。

因此，在@Veedrac的宝贵帮助下（非常感谢！）我终于想出了一个脚本，演示了在Cython中使用内存视图和c数组来加快计算速度。它们的速度都差不多，所以我个人认为使用内存视图要容易得多。

下面是一个示例cython脚本，它"接受"numpy数组并将其转换为内存视图或c数组，然后通过c级函数执行简单的数组求和：

# cython: boundscheck=False
cimport cython
import numpy as np
cimport numpy as np
from numpy import int32
from numpy cimport int32_t

#Generate numpy array:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=np.dtype("i"))
cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int i, j
testsum = np.sum(narr)
print "Test summation: np.sum(narr) =", testsum
#Generate the memory view:
cdef int [:,:] x_view = narr
#Generate the 2D c-array and its pointer:
cdef:
    int32_t[:, :] cython_view = narr
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_arr = <int32_t[4] *>c_integers_array

def test1():
    speed_test_mview(x_view)  
def test2():
    speed_test_carray(&c_arr[0][0], a, b)

cdef int speed_test_mview(int[:,:] x_view):
    cdef int n, i, j, thesum
    # Define the view:
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += x_view[i, j]        

cdef int speed_test_carray(int32_t *c_Arr, int a, int b):
    cdef int n, i, j, thesum
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += c_Arr[(i*b)+j]

然后使用ipython shell计时测试显示出类似的速度：

import testlib as t
Test summation: np.sum(narr) = 136
%timeit t.test1()
10000000 loops, best of 3: 46.3 ns per loop
%timeit t.test2()
10000000 loops, best of 3: 46 ns per loop

哦，为了进行比较，在本例中使用numpy数组花费了125毫秒（未显示）。

更新：

相关内容

最新更新

热门标签：