玩具示例
考虑这个非常简单的梯度下降实现,我试图将线性回归(mx+b(拟合到一些玩具数据中。
import torch
# Make some data
torch.manual_seed(0)
X = torch.rand(35) * 5
Y = 3 * X + torch.rand(35)
# Initialize m and b
m = torch.rand(size=(1,), requires_grad=True)
b = torch.rand(size=(1,), requires_grad=True)
# Pass 1
yhat = X * m + b # Calculate yhat
loss = torch.sqrt(torch.mean((yhat - Y)**2)) # Calculate the loss
loss.backward() # Reverse mode differentiation
m = m - 0.1*m.grad # update m
b = b - 0.1*b.grad # update b
m.grad = None # zero out m gradient
b.grad = None # zero out b gradient
# Pass 2
yhat = X * m + b # Calculate yhat
loss = torch.sqrt(torch.mean((yhat - Y)**2)) # Calculate the loss
loss.backward() # Reverse mode differentiation
m = m - 0.1*m.grad # ERROR
第一次通过运行良好,但第二次通过在最后一行m = m - 0.1*m.grad
上出错。
错误
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
return self._grad
我对为什么会发生这种情况的理解是,在通行证1期间,这条线路
m = m - 0.1*m.grad
将m
复制到一个全新的张量(即一个完全独立的内存块(中。所以,它从一个叶张量变成了一个非叶张量。
# Pass 1
...
print(f"{m.is_leaf}") # True
m = m - 0.1*m.grad
print(f"{m.is_leaf}") # False
那么,如何执行更新呢
我看到它提到可以使用类似m.data = m - 0.1*m.grad
的东西,但我还没有看到关于这种技术的太多讨论。
您的观察结果是正确的,为了执行更新,您应该:
-
使用在位运算符应用修改。
-
使用
torch.no_grad
上下文管理器包装调用。
例如:
with torch.no_grad():
m -= 0.1*m.grad # update m
b -= 0.1*b.grad # update b