cuda-gdb:指示的内核不在代码中

我最初的问题是，我的函数有一长串参数，超过了允许作为参数传递给cuda内核的内存（我不记得有多少字节，因为我已经有一段时间没有处理过了）。因此，我绕过这个问题的方法是定义一个新的结构，它的成员是指向其他结构的指针，我稍后可以从内核中取消引用这些结构。

这就是当前问题的开始：当我试图从内核中取消引用指针（我之前创建的结构的成员）时，我得到了CUDA_EXCEPTION_5, Warp Out-of-range Address…来自cuda gdb。最重要的是，内核名称和参数（被报告为"目前不存在"，cuda gdb给出了错误的名称和参数）不是我在代码中创建的。

现在，了解更多细节：

以下是涉及的结构：

typedef struct {
    int strx;
    int stry;
    int strz;
    float* el;
} manmat;
typedef struct {
    manmat *x;
    manmat *y;
    manmat *z;
} manmatvec;

以下是我在main中对内核参数进行分组的方法：

int main () {
...
...
    manmat resu0;
    resu0.strx = n+2;       resu0.stry = m+2;       resu0.strz = l+2;
    if (cudaMalloc((void**)&resu0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resu0" << endl ;
    manmat resv0;
    resv0.strx = n+2;       resv0.stry = m+2;       resv0.strz = l+2;
    if (cudaMalloc((void**)&resv0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resv0" << endl ;
    manmat resw0;
    resw0.strx = n+2;       resw0.stry = m+2;       resw0.strz = l+2;
    if (cudaMalloc((void**)&resw0.el,sizeof(float) * (n+2)*(m+2)*(l+2)) != cudaSuccess) cout << endl << " ERROR allocating memory for manmat resw0" << endl ;
    manmatvec residues0 ;
    residues0.x = &resu0;
    residues0.y = &resv0;
    residues0.z = &resw0;
    exec_res_std_2d <<<numBlocks2D, threadsPerBlock2D>>> (residues0, ......) ;
 .....
}

这就是内核中发生的情况：

__global__ void exec_res_std_2d (manmatvec residues, ......) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int k = blockIdx.y * blockDim.y + threadIdx.y;
    manmat *resup;
    manmat *resvp;
    manmat *reswp;
    resup = residues.x;
    resvp = residues.y;
    reswp = residues.z;
    manmat resu, resv, resw ;
    resu.strx = (*resup).strx;     //LINE 1626
    resu.stry = (*resup).stry;
    resu.strz = (*resup).strz;
    resu.el = (*resup).el;
    resv = *resvp;
    resw = *reswp;
    .....
}

最后，这就是cuda-gdb输出的内容：

..................
[Launch of CUDA Kernel 1065 (exec_res_std_2d<<<(1,2,1),(32,16,1)>>>) on Device 0]
[Launch of CUDA Kernel 1066 (exec_res_bot_2d<<<(1,2,1),(32,16,1)>>>) on Device 0]
Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 1065, grid 1066, block (0,0,0), thread (0,2,0), device 0, sm 0, warp 2, lane 0]
0x0000000003179020 in fdivide<<<(1,2,1),(32,16,1)>>> (a=warning: Variable is not live at this point. Value is undetermined.
..., pt=warning: Variable is not live at this point. Value is undetermined.
..., cells=warning: Variable is not live at this point. Value is undetermined.
...) at ola.cu:1626
1626    ola.cu: No such file or directory.
    in ola.cu

我必须注意，我在名为fdivide的代码中没有定义ANY函数__device__或__global__。。。。。

此外，重要的是，在调试器内程序运行的开始，尽管我使用-arch=sm_20 -g -G -gencode arch=compute_20,code=sm_20编译了我的cuda c文件，但我得到了

[New Thread 0x7ffff3b69700 (LWP 12465)]
[Context Create of context 0x1292340 on Device 0]
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.1619c10.o.LkkWns
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.1940ad0.o.aHtC7W
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2745680.o.bVXEWl
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2c438b0.o.cgUqiP
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2c43980.o.4diaQ4
warning: no loadable sections found in added symbol-file /tmp/cuda-dbg/12456/session1/elf.1292340.2dc9380.o.YYJAr5

任何可以帮助我解决这个问题的答案、提示或建议都非常欢迎！请注意，我最近才开始使用cuda-c进行编程，而且我对cuda-gdb不是很有经验。我在C代码中进行的大多数调试都是通过检查代码各个点的输出来"手动"完成的。。。。

此外，此代码在tesla M2090上运行，也被编译为在2.0架构上运行。

这将是一个问题：

manmatvec residues0 ;
    residues0.x = &resu0;
    residues0.y = &resv0;
    residues0.z = &resw0;

resu0、resv0和resw0变量在主机堆栈上的主机内存中分配。您将主机地址放入manmatvec结构中，然后将manmatvec传递到内核中。在接收端，CUDA代码不能访问结构中提供的主机存储器地址。

如果要传递resu0、resv0和resw0变量的地址，则需要从设备内存中分配它们。

我不知道这是否是整个问题，但我确信这是一个最大的贡献者。

相关内容

最新更新

热门标签：