I am trying to train a model, at first I had dataset of 5000 images and training worked fine, Now I have added couple of more images and now my dataset contains 6,423 images.我在 Ubuntu 18.04 上使用 python 3.6.1,我的 tensorflow 版本是 1.15,numpy 版本是 1.16(之前有相同的版本,它工作正常(。 现在当我使用:
python model_main.py --logtostderr --pipeline_config_path=training/faster_rcnn_resnet50_coco.config --model_dir=training
它会启动设置几分钟,并在以下行之后:
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I1123 10:26:21.548237 140482563244160 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2019-11-23 10:28:30.801453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
我得到以下错误:
2019-11-23 10:08:38.843259: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.843323: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.843345: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851405: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851488: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851512: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851807: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851848: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851899: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]
[[{{node IteratorGetNext}}]]
[[ToAbsoluteCoordinates_118/Assert/AssertGuard/Assert/data_0/_5709]]
(1) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]
[[{{node IteratorGetNext}}]]
0 successful operations.
0 derived errors ignored.
和训练停止。
您添加的新图像的分辨率似乎为 585x1024,这与模型预期的大小(即 600x799(不同。
如果是这样,那么解决方案是相应地调整这些新图像的大小。
如果您需要批量大小> 1,您可以使用配置中的正确image_resizer
将图像大小调整为统一大小,这是 image_resizer protobuf 文件中定义的文件之一,我假设这是用于解析该部分配置的内容。
例如(从这里偷来的(:
image_resizer {
fixed_shape_resizer {
height: 600
width: 800
}
}
这似乎为我解决了问题。
将batch_size更改为 1 为我解决了这个问题。
在小批量中,所有图像必须具有相同的大小,因此您必须将所有照片调整为相同的大小或将批量大小设置为 1
刚刚删除了数据增强,它对我有用。 此外,如果需要,您可以尝试删除一个接一个的数据增强......但是删除所有内容对我有用。