当模型在训练时,在某一步之后,它显示为loss=nan



我正在进行TensorFlow对象检测。我正在使用谷歌Colab。当模型在训练时,在某一步之后,它显示为loss=nan。我该怎么解决这个问题?

型号=ssd_efficientdet_d2

输出=

I1125 09:30:20.814607 139701278168960 model_lib_v2.py:652] Step 1400 per-step time 0.418s loss=1.650 INFO:tensorflow:Step 1500 per-step time 0.601s loss=1.285
I1125 09:31:09.918310 139701278168960 model_lib_v2.py:652] Step 1500 per-step time 0.601s loss=1.285
INFO:tensorflow:Step 1500 per-step time 0.601s loss=1.285
I1125 09:31:09.918310 139701278168960 model_lib_v2.py:652] Step 1500 per-step time 0.601s loss=1.285
INFO:tensorflow:Step 1600 per-step time 0.444s loss=1.344
I1125 09:31:59.594189 139701278168960 model_lib_v2.py:652] Step 1600 per-step time 0.444s loss=1.344
INFO:tensorflow:Step 1700 per-step time 0.511s loss=nan
I1125 09:32:49.015780 139701278168960 model_lib_v2.py:652] Step 1700 per-step time 0.511s loss=nan
INFO:tensorflow:Step 1800 per-step time 0.576s loss=nan
I1125 09:33:39.257319 139701278168960 model_lib_v2.py:652] Step 1800 per-step time 0.576s loss=nan
INFO:tensorflow:Step 1900 per-step time 0.439s loss=nan
I1125 09:34:27.547188 139701278168960 model_lib_v2.py:652] Step 1900 per-step time 0.439s loss=nan
INFO:tensorflow:Step 2000 per-step time 0.445s loss=nan
I1125 09:35:17.008013 139701278168960 model_lib_v2.py:652] Step 2000 per-step time 0.445s loss=nan
INFO:tensorflow:Step 2100 per-step time 0.490s loss=nan
I1125 09:36:08.541600 139701278168960 model_lib_v2.py:652] Step 2100 per-step time 0.490s loss=nan
INFO:tensorflow:Step 2200 per-step time 0.697s loss=nan

我见过很多事情会使模型产生偏差,这可能会导致损失增加或准确性下降。

  1. 可能是由于学习率高,所以首先要降低学习率
  2. 如果使用的分类器正确,请检查分类器DNNClassifier
  3. 检查标签是否正确,是否在损失函数域中
  4. 同时检查损失函数。有时,这是因为输入数据不符合损失函数
  5. 确保数据已正确规范化。您可能希望像素在[-1,1]的范围内,而不是[0255]

最新更新