将 ctypes 与 cuda 一起使用，在脚本末尾出现段错误(核心转储)错误

我在python中使用ctypes来调用一些cuda函数并跟踪指针，但是我遇到了段错误，所以我将问题归结为以下内容。

Python 调用一个 cuda 函数，该函数分配然后释放 GPU 上的内存。如果这就是我所做的一切，那么这工作正常。但是，如果我也定义了一个 numpy 数组并尝试采用它的平方范数（ np.dot(a, a) ），那么，如果a足够大，我会得到Segmentation fault (core dumped)

这是基本代码

库达代码debug.cu：

#include <stdio.h>
#include <stdlib.h>
extern "C" {
void all_together( size_t N)
{
    void*d;
    int size = N *sizeof(float);
    int err;
    err = cudaMalloc(&d, size);
    if (err != 0) printf("cuda malloc error: %dn", err);
    err = cudaFree(d);
    if (err != 0) printf("cuda free error: %dn", err);
}}

Python 代码master.py：

import numpy as np
import ctypes
from ctypes import *
dll = ctypes.CDLL('./cuda_lib.so', mode=ctypes.RTLD_GLOBAL)
def build_all_together_f(dll):
    func = dll.all_together
    func.argtypes = [c_size_t]
    return func
__pycu_all_together = build_all_together_f(dll)

if __name__ == '__main__':
    N = 5001 # if this is less, the error doesn't show up
    a = np.random.randn(N).astype('float32')
    da = __pycu_all_together(N)
    # toggle this line on/off to get error
    #np.dot(a, a)
    print 'end of python'

编译： nvcc -Xcompiler -fPIC -shared -o cuda_lib.so debug.cu

运行： python master.py

注意：这以前是另一个问题，但我删除了它并重写了它，使其更紧凑和切中要害。

将 CUDA 更新到 5.5 版，问题消失了！

相关内容

最新更新

热门标签：