有没有一种方法可以使用cuda-gdb调用内核故障?我尝试过遍历内核代码并设置无效的索引位置、变量的奇数值,但在从错误的设置继续操作后,我无法触发"内核执行失败"。
有人知道通过cuda gdb实现这一点的正确方法吗?我已经阅读了两次cuda-gdb文档,但如果可能的话,可能会错过一些关于如何实现这一点的线索。如果有人知道任何工具/技术,我们将不胜感激,谢谢。
我在centos7上,我的设备的计算能力是2.1。请参阅下面的uname-a命令的输出。
Linux john 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
有没有一种方法可以使用cuda-gdb调用内核故障?
是的,这是可能的。下面是一个完整的例子:
$ cat t678.cu
#include <stdio.h>
__global__ void kernel(int *data){
int idx = 0; // line 4
idx += data[0];
int tval = data[idx];
data[1] = tval;
}
int main(){
int *d_data;
cudaMalloc(&d_data, 32*sizeof(int));
cudaMemset(d_data, 0, 32*sizeof(int));
kernel<<<1,1>>>(d_data);
cudaDeviceSynchronize();
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) printf("kernel fail %sn", cudaGetErrorString(err));
}
$ nvcc -g -G -o t678 t678.cu
$ cuda-gdb ./t678
NVIDIA (R) CUDA Debugger
7.5 release
Portions Copyright (C) 2007-2015 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/user2/misc/t678...done.
(cuda-gdb) break t678.cu:4
Breakpoint 1 at 0x4026d5: file t678.cu, line 4.
(cuda-gdb) run
Starting program: /home/user2/misc/./t678
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff700a700 (LWP 8693)]
[Switching focus to CUDA kernel 0, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 14, warp 2, lane 0]
Breakpoint 1, kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:4
4 int idx = 0; // line 4
(cuda-gdb) step
5 idx += data[0];
(cuda-gdb) print idx
$1 = 0
(cuda-gdb) set idx=1000000
(cuda-gdb) step
6 int tval = data[idx];
(cuda-gdb) print idx
$2 = 1000000
(cuda-gdb) step
CUDA Exception: Device Illegal Address
The exception was triggered in device 0.
Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:7
7 data[1] = tval;
(cuda-gdb)
在上面的cuda-gdb输出中,您可以看到,在将idx
变量设置为大值后,在调试器中执行以下行时会导致索引越界(非法地址)错误:
int tval = data[idx];