无法向批处理添加张量:元素数不匹配.形状为:[张量]:[585,1024,3],[批次]:[600,799,3]



I am trying to train a model, at first I had dataset of 5000 images and training worked fine, Now I have added couple of more images and now my dataset contains 6,423‬ images.我在 Ubuntu 18.04 上使用 python 3.6.1,我的 tensorflow 版本是 1.15,numpy 版本是 1.16(之前有相同的版本,它工作正常(。 现在当我使用:

python model_main.py --logtostderr --pipeline_config_path=training/faster_rcnn_resnet50_coco.config --model_dir=training

它会启动设置几分钟,并在以下行之后:

INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt. 
I1123 10:26:21.548237 140482563244160 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt. 
2019-11-23 10:28:30.801453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 

我得到以下错误:

2019-11-23 10:08:38.843259: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.843323: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.843345: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851405: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851488: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851512: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851807: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.               
2019-11-23 10:08:38.851848: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.                 
2019-11-23 10:08:38.851899: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.               
Traceback (most recent call last):                                                                                                                                                                                                             
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                                                                                                                 
return fn(*args)                                                                                                                                                                                                                           
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                                                                                                                  
target_list, run_metadata)                                                                                                                                                                                                                 
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                                                                                                                      
run_metadata)                                                                                                                                                                                                                            
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.                                                                                                                                                           
(0) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]                                                                                                   
[[{{node IteratorGetNext}}]]                                                                                                                                                                                                                 
[[ToAbsoluteCoordinates_118/Assert/AssertGuard/Assert/data_0/_5709]]                                                                                                                                                                  
(1) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]                                                                                                   
[[{{node IteratorGetNext}}]]                                                                                                                                                                                                        
0 successful operations.                                                                                                                                                                                                                     
0 derived errors ignored. 

和训练停止。

您添加的新图像的分辨率似乎为 585x1024,这与模型预期的大小(即 600x799(不同。

如果是这样,那么解决方案是相应地调整这些新图像的大小。

如果您需要批量大小> 1,您可以使用配置中的正确image_resizer将图像大小调整为统一大小,这是 image_resizer protobuf 文件中定义的文件之一,我假设这是用于解析该部分配置的内容。

例如(从这里偷来的(:

image_resizer {
fixed_shape_resizer {
height: 600
width: 800
}
}

这似乎为我解决了问题。

将batch_size更改为 1 为我解决了这个问题。

在小批量中,所有图像必须具有相同的大小,因此您必须将所有照片调整为相同的大小或将批量大小设置为 1

刚刚删除了数据增强,它对我有用。 此外,如果需要,您可以尝试删除一个接一个的数据增强......但是删除所有内容对我有用。

最新更新