nvprof warning on CUDA_VISIBLE_DEVICES

当我在pytorch中使用os.environ['CUDA_VISIBLE_DEVICES']时，我收到以下消息

Warning: Device on which events/metrics are configured are different than the device on which it is being profiled. One of the possible reason is setting CUDA_VISIBLE_DEVICES inside the application.

这到底是什么意思？如何通过使用"CUDA_VISIBLE_DEVICES"(而不是torch.cuda.set_device(((来避免这种情况？

这是 pytorch 中的代码 test.py

import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
g = 1
c1 = 512
c2 = 512
input = torch.randn(64, c1, 28, 28).cuda()
model = nn.Sequential(
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
).cuda()
out = model(input)

和命令：

nvprof --analysis-metrics -o metrics python test.py

这实际上意味着什么？

这意味着 nvprof 开始在 GPU 上下文上分析您的代码，您通过设置CUDA_VISIBLE_DEVICES使其不可用。

如何通过使用CUDA_VISIBLE_DEVICES(而不是torch.cuda.set_device(((来避免这种情况？

大概是这样的：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
import torch
....

我对 pytorch 一无所知，但我猜导入库会触发很多你看不到的 CUDA 活动。如果在设置CUDA_VISIBLE_DEVICES后导入库，我怀疑整个问题都会消失。

如果这不起作用，那么您别无选择，只能根本不在 python 代码中设置CUDA_VISIBLE_DEVICES，而是这样做：

CUDA_VISIBLE_DEVICES=1 nvprof --analysis-metrics -o metrics python test.py

相关内容

最新更新

热门标签：