如何dy(Tensorflow中的上游梯度)在下面计算?

在下面的代码中:

dy计算为1。这个值是如何计算的(数学是什么)?根据tf。Custom_gradient guide, dy是上游的梯度

为什么最终梯度乘以clip_norm值(0.6)?(这意味着(v * v)的final_gradients要乘以0.6,v * v的gradient是2v，为什么要乘以0.6?)

@tf.custom_gradient
def clip_gradients(y):
print('y',y)
def backward(dy):
print('dy',dy)
return tf.clip_by_norm(dy, 0.6)
return y, backward

v = tf.Variable(3.0)
with tf.GradientTape() as t:
output = clip_gradients(v * v)
print('output',output)
print('Final Gradient is ',t.gradient(output, v))

代码输出

y tf.Tensor(9.0, shape=(), dtype=float32)
output tf.Tensor(9.0, shape=(), dtype=float32)
dy tf.Tensor(1.0, shape=(), dtype=float32)
Final Gradient is  tf.Tensor(3.6000001, shape=(), dtype=float32)

dy在反向传播开始时初始化为1.，因为这是恒等函数的导数。通过链式法则，我们知道f(g(x))'=f'(g(x))*g'(x)。如果f是恒等函数(f(x) = x)，则前面的表达式变为1*g'(x)。

函数clip_gradients将0.6上的任何梯度值剪辑到0.6。dy的初始值是1.0(如上所述)。

如果我们把链式法则应用到你的例子中，我们有:

恒等式的导数是1.0，然后裁剪为0.6。

v*v

2*v

通过应用链式法则，我们得到最终的梯度为0.6*2*v，当v=3时等于3.6。

相关内容

最新更新

热门标签：