为什么 cuda-gdb 会启动多个线程?



当我在cuda-gdb中启动程序时,我得到的输出如下:

[New Thread 0x7fffef8ea700 (LWP 8003)]
[New Thread 0x7fffe35b2700 (LWP 8010)]
[New Thread 0x7fffe2db1700 (LWP 8011)]
[New Thread 0x7fffe25b0700 (LWP 8012)]

我不明白为什么一开始就启动了这些多线程。我尚未以多线程模式启动程序。我正在使用 MPI,但我启动了一个过程。那么,这些线索从何而来?

这不会以任何方式影响我的调试过程。只是我不明白这意味着什么。

您看到的这些线程是由 CUDA 运行时库创建的,与cuda-gdb本身没有直接关系。如果使用gdb运行相同的代码,则还将看到相同的消息。

如果您想查看这些线程正在做什么或它们来自哪里,只需使用-g标志编译代码,在代码中设置断点(例如,在 CUDA 内核启动之前(,运行它,然后在gdb控制台中运行以下命令:

thread apply all backtrace

此命令与 gdb 的backtrace具有相同的效果,只是它将显示程序创建的所有线程的回溯。

就我而言,启动程序后收到以下消息:

[New Thread 0x7fffeffb3700 (LWP 7141)]
[New Thread 0x7fffef731700 (LWP 7142)]
[New Thread 0x7fffeef30700 (LWP 7143)]

当我在gdb控制台中运行上述命令时,我看到以下输出:

(gdb) thread apply all backtrace
Thread 4 (Thread 0x7fffeef30700 (LWP 7143)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007ffff63c19b7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff6386bb7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffeef30700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7fffef731700 (LWP 7142)):
#0  0x00007ffff6cc5aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff63bf6a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff642261e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffef731700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7fffeffb3700 (LWP 7141)):
#0  0x00007ffff6ccfa9f in accept4 (fd=13, addr=..., addr_len=0x7fffeffb2e18, flags=-1) at ../sysdeps/unix/sysv/linux/accept4.c:45
#1  0x00007ffff63c0556 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff63b404d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread (arg=0x7fffeffb3700) at pthread_create.c:309
#5  0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fc0740 (LWP 7136)):
#0  main () at cuda_heap.cu:66

您可以验证,在开始时创建的所有线程都与线程地址和 LWP(轻量级进程(ID 匹配。你可以看到它们都来自libcuda.so.1。

cuda-gdb中,您可以看到一些更详细的信息:

(cuda-gdb) thread apply all bt
Thread 4 (Thread 0x7fffeef30700 (LWP 10019)):
#0  0x00007ffff79c33f8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007ffff63c19b7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff6386bb7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 3 (Thread 0x7fffef731700 (LWP 10018)):
#0  0x00007ffff6cc5aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff63bf6a3 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff642261e in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 2 (Thread 0x7fffeffb3700 (LWP 10017)):
#0  0x00007ffff6ccfa9f in accept4 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff63c0556 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff63b404d in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 1 (Thread 0x7ffff7fc0740 (LWP 10007)):
#0  main () at cuda_heap.cu:66

我不知道它到底是什么,但我认为 CUDA-gdb 需要创建多个线程来捕获错误/异常,例如:内存违规或银行冲突。

相关内容

  • 没有找到相关文章

最新更新