我正在尝试使用来自Tensorflow模型动物园的预训练对象检测模型。基本上,我选择了faster_rcnn_inception_resnet_v2_atrous_oidv4
在开放图像数据集上训练。
这是我的代码:
import tensorflow as tf
# restore the deep model
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/'))
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
X = graph.get_tensor_by_name('image_tensor:0')
feed_dict ={X: image_raw_feature}
#Now, access the op that we want to run.
num_detections = graph.get_tensor_by_name('num_detections:0')
detection_scores = graph.get_tensor_by_name('detection_scores:0')
detection_boxes = graph.get_tensor_by_name('detection_boxes:0')
x1, x2, x3 = sess.run(
[num_detections, detection_scores, detection_boxes],
feed_dict
)
输出x1, x2, x3
具有形状4
、[4, 100]
和[4, 100, 4]
。问题是我不知道如何将结果解码为人类可读的标签。我猜对象类别的总数是 100 如 x2
?但与数据集"打开图像"中描述的内容相比,它似乎非常小。
如何将输出解码为标签?
如faster_rcnn_meta_arch.py中所述,输出张量应具有以下形状:
detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections]
detection_classes: [batch, max_detections]
num_detections: [batch]
此处bacth = 4
、max_detections = 100
它包含具有不同置信度分数的所有检测,因此可能需要确定分数阈值以筛选出置信度分数较低的检测。此外,detection_boxes
包含按ymin, xmin, ymax, xmax
顺序和规范化坐标排列的框编码,您需要获取图像的形状才能获得绝对坐标。
例如,假设您希望所有检测都带有 score > 0.5
:
final_boxes = []
for i in range(int(num_detections)):
final_boxes.append(detection_boxes[i, detection_scores[i]>0.5, ])
这将为你提供置信度分数高于 0.5 的检测。