获取 NVPRO 中的执行时间

有没有办法像指标一样在 nvprof 中获取内核执行时间？

例如，要获取 DRAM 读取事务，我键入：

nvprof --metrics dram_read_transactions ./myprogram

我的问题是：有没有类似的东西

nvprof --metrics execution_time ./myprogram

我想在一个命令行中收集一小组指标，而不必使用

nvprof ./myprogram

作为单独的命令。

我相信你正在寻找： nvprof --print-gpu-trace ./myprogram

你应该在nVIDIA的"CUDA Pro Tip"博客上阅读这篇文章：

CUDA 专业提示：nvprof 是您方便的通用 GPU 分析器

它将引导您完成有关如何使用nvprof来分析和安排应用程序时间的一些基础知识。具体来说，如果你写这样的东西：

nvprof --print-gpu-trace ./nbody --benchmark -numdevices=2 -i=1

(该示例适用于 n 体物理问题模拟器(，您的输出将包括以下内容：

...
==4125== Profiling application: ./nbody --benchmark -numdevices=2 -i=1
==4125== Profiling result:
Start  Duration            Grid Size      Block Size     Regs*    SSMem*    DSMem*      Size  Throughput           Device   Context    Stream  Name
260.78ms     864ns                    -               -         -         -         -        4B  4.6296MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
260.79ms     960ns                    -               -         -         -         -        4B  4.1667MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
260.93ms     896ns                    -               -         -         -         -        4B  4.4643MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
260.94ms     672ns                    -               -         -         -         -        4B  5.9524MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
268.03ms  1.3120us                    -               -         -         -         -        8B  6.0976MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
268.04ms     928ns                    -               -         -         -         -        8B  8.6207MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
268.19ms     864ns                    -               -         -         -         -        8B  9.2593MB/s   Tesla K20c (0)         2         2  [CUDA memcpy HtoD]
268.19ms     800ns                    -               -         -         -         -        8B  10.000MB/s  GeForce GTX 680         1         2  [CUDA memcpy HtoD]
274.59ms  2.2887ms             (52 1 1)       (256 1 1)        36        0B  4.0960KB         -           -   Tesla K20c (0)         2         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [242]
274.67ms  981.47us             (32 1 1)       (256 1 1)        36        0B  4.0960KB         -           -  GeForce GTX 680         1         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [257]
276.94ms  2.3146ms             (52 1 1)       (256 1 1)        36        0B  4.0960KB         -           -   Tesla K20c (0)         2         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [275]
276.99ms  979.36us             (32 1 1)       (256 1 1)        36        0B  4.0960KB         -           -  GeForce GTX 680         1         2  void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [290]

这是所有内核的时间。

运行nvprof --help并花 5-10 分钟阅读选项也很有用;例如，如果要在脚本中处理跟踪，则会找到用于以 CSV 格式打印跟踪的开关。

相关内容

最新更新

热门标签：