列车成本在Tensorflow代码示例中产生Nan值



我相信这对专门研究TensorFlow的人来说是一个简单的问题,但我无法解决。

我正在尝试从Github执行以下代码。

https://github.com/drhuangliwei/An-Attention-based-Spatiotemporal-LSTM-Network-for-Next-POI-Recommendation

当我运行AT-LSTM.py时,第240行的产量低于

if(global_steps%100==0):
print("the %i step, train cost is: %f"%(global_steps,cost))
global_steps+=1

输出

the 100 step, train cost is: nan
the 200 step, train cost is: nan
the 300 step, train cost is: nan
the 400 step, train cost is: nan
the 500 step, train cost is: nan
the 600 step, train cost is: nan
the 700 step, train cost is: nan
the 800 step, train cost is: nan
the 900 step, train cost is: nan
the 1000 step, train cost is: nan
the 1100 step, train cost is: nan
the 1200 step, train cost is: nan
the 1300 step, train cost is: nan
the 1400 step, train cost is: nan
the 1500 step, train cost is: nan
the 1600 step, train cost is: nan
the 1700 step, train cost is: nan
the 1800 step, train cost is: nan
the 1900 step, train cost is: nan
the 2000 step, train cost is: nan
the 2100 step, train cost is: nan
the 2200 step, train cost is: nan
the 2300 step, train cost is: nan
the 2400 step, train cost is: nan
the 2500 step, train cost is: nan
the 2600 step, train cost is: nan
the 2700 step, train cost is: nan
the 2800 step, train cost is: nan
the 2900 step, train cost is: nan
the 3000 step, train cost is: nan
the 3100 step, train cost is: nan
the 3200 step, train cost is: nan

每一次迭代的成本值都得到Nan值。你知道为什么我在每次迭代中都得到Nan值吗

RNN/LSTM中出现这种情况的一个常见原因是梯度爆炸,您可以通过tf.clip(如何在TensorFlow中应用梯度剪裁?(来避免这种情况

你也可以通过使用负面标签或过大的学习率来获得这一点。此外,检查权重初始化。

发生这种情况有几个潜在的原因。这里最常见的答案是

  • 爆炸梯度
  • 消失的梯度

当梯度";爆炸";变成一个非常大的数字。这可以通过渐变剪裁进行控制。一种常见的方法是在应用渐变之前按规范进行剪裁。如果你控制你的train_step,你可以这样做:

with tf.GradientTape() as tape:
logits = self(x_batch, training=True)
loss = self.compiled_loss(y_true, logits)
# backprop
grads = tape.gradient(loss, self.trainable_weights)
grads = [
tf.clip_by_norm(g, self.gradient_clip_norm)  # tunable parameter
for g in grads
]
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

另一种情况是,在误差信号不能在整个网络中传播的网络中可能出现消失梯度。这可能是由以下几件事造成的:

  • 您的学习率可能过高
  • 你的人际网络可能很深

您可以使用较低的学习率作为初始解决方案,但如果仍然不起作用,您可以探索网络架构中的剩余连接,这有助于消除梯度。

最新更新