nvprof-警告:未收集到配置文件数据

在尝试使用nvprof评测程序时，我收到以下输出，但没有其他信息：

<program output>
======== Warning: No profile data collected.

使用的代码遵循这个经典的第一个cuda程序。我以前在我的系统上做过nvprof工作，但最近我不得不重新安装cuda。

我试图遵循这篇文章中的建议，其中建议包括cudaDeviceReset()和cudaProfilerStart/Stop()，并使用一些额外的分析标志nvprof --unified-memory-profiling off，但运气不好。

这篇nvidia开发者论坛帖子似乎遇到了类似的错误，但这里的建议似乎表明，由于我不使用某些OpenACC库，需要使用与nvcc不同的编译器。

系统规范

系统：使用WSL2的Windows 11 x64
CPU:i7 8750H
GPU:gtx 1050 ti
CUDA版本：11.8

为了完整起见，我已经包含了我的程序代码，尽管我想我的系统还有更多的原因：

编译：

nvcc add.cu -o add_cuda

分析：

nvprof ./add_cuda

add.cu:

#include <iostream>
#include <math.h>
#include <cuda_profiler_api.h>
// function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}
int main(void)
{
int N = 1<<20; // 1M elements
cudaProfilerStart();
// Allocate Unified Memory -- accessible from CPU or GPU
float *x, *y;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "Max error: " << maxError << std::endl;
// Free memory
cudaFree(x);
cudaFree(y);
cudaDeviceReset();
cudaProfilerStop();
return 0;
}

如何使用nvprof解决此问题以获得实际的评测信息？

根据文档，CUDA中目前不支持WSL的评测。这就是为什么在使用nvprof时没有收集分析数据的原因。

系统规范

相关内容

最新更新

热门标签：