我做了一个玩具CNN模型。
class Test(nn.Module):
def __init__(self):
super(Test, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(3,300,3),
nn.Conv2d(300,500,3),
nn.Conv2d(500,1000,3),
)
self.fc = nn.Linear(3364000,1)
def forward(self, x):
out = self.conv(x)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
然后,我检查了模型。通过这个代码进行总结
model = Test()
model.to('cuda')
for param in model.parameters():
print(param.dtype)
break
summary_(model, (3,64,64))
,我能够得到以下结果:
torch.float32
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 300, 62, 62] 8,400
Conv2d-2 [-1, 500, 60, 60] 1,350,500
Conv2d-3 [-1, 1000, 58, 58] 4,501,000
Linear-4 [-1, 1] 3,364,001
================================================================
Total params: 9,223,901
Trainable params: 9,223,901
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 48.20
Params size (MB): 35.19
Estimated Total Size (MB): 83.43
----------------------------------------------------------------
我想减少模型大小,因为我想增加批量大小。
所以,我改变了torch.float32
->torch.float16
viaNVIDIA/apex
model = Test()
model.to('cuda')
opt_level = 'O3'
optimizer = optim.Adam(model.parameters(), lr=0.001)
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
for param in model.parameters():
print(param.dtype)
break
summary_(model, (3,64,64))
Selected optimization level O3: Pure FP16 training.
Defaults for this optimization level are:
enabled : True
opt_level : O3
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : False
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O3
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : False
master_weights : False
loss_scale : 1.0
torch.float16
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 300, 62, 62] 8,400
Conv2d-2 [-1, 500, 60, 60] 1,350,500
Conv2d-3 [-1, 1000, 58, 58] 4,501,000
Linear-4 [-1, 1] 3,364,001
================================================================
Total params: 9,223,901
Trainable params: 9,223,901
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 48.20
Params size (MB): 35.19
Estimated Total Size (MB): 83.43
----------------------------------------------------------------
结果torch.dtype
由torch.float32
变为torch.float16
,
但Param size (MB): 35.19
不变,
为什么会发生这种情况?
谢谢。
混合精度并不意味着你的模型变成原来的一半大小。参数默认保持在float32
dtype,在神经网络训练的某些操作中自动转换为float16
。这也适用于输入数据。
torch.cuda.amp
提供在某些训练操作(如卷积)中执行从float32
到float16
的自动转换的功能。您的模型大小将保持不变。减少模型大小称为quantization
,与混合精度训练不同。
你可以在NVIDIA的博客和Pytorch的博客上阅读更多关于混合精度训练的内容。