我在使用 Pytorch 时遇到了一个非常奇怪的错误,我使用以下模型合成视频。我正在尝试仅将迁移学习应用于编码器。为此,我用requires_grad = False
冻结了发电机的重量,并对编码器做了相反的操作。
这是我的模型:
autoencoder(
(generator): VideoGenerator(
(recurrent): GRUCell(10, 10)
(main): Sequential(
(0): ConvTranspose2d(64, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace)
(9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ReLU(inplace)
(12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(13): Tanh()
)
)
(encoder): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(1): ReLU(inplace)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(3): ReLU(inplace)
(4): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(5): ReLU(inplace)
(6): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(7): ReLU(inplace)
(8): Conv2d(512, 64, kernel_size=(4, 4), stride=(1, 1), bias=False)
)
)
这是我循环params.requires_grad时的输出
*** encoder ***
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
*** generator ***
18 True
19 True
20 True
21 True
22 True
但这会产生运行时错误,如下所示:
RuntimeError Traceback (most recent call last)
<ipython-input-7-73fe2d39b929> in on_wl_clicked(b)
87 # print(model)
88
---> 89 loss.backward()
90
91 # show_state("AFTER BACKWARD STEP:", model)
/usr/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
100 products. Defaults to ``False``.
101 """
--> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):
/usr/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
奇怪的是,当我设置 0 个张量 requires_grad = True 时(见下文(,它运行但无法收敛
*** encoder ***
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
*** generator ***
18 True
19 True
20 True
21 True
22 True
关于此错误的来源或如何修复它的任何想法?
问题是损失张量从链式规则的顶部继承了它的auto_grad,并且由于第一个张量requires_grad = False
,损失张量也是假的。
修复很简单,添加行
loss.requires_grad = True
loss.backward()
另一种解决方案就是像这样更改优化器
# optimizer = optim.Adam(param = model.parameters(), lr = lr)
optimizer = optim.Adam(param = model.encoder.parameters(), lr = lr)