将预训练模型生成的预测输出解码为人类可读的标签



我正在尝试使用来自Tensorflow模型动物园的预训练对象检测模型。基本上,我选择了faster_rcnn_inception_resnet_v2_atrous_oidv4在开放图像数据集上训练。

这是我的代码:

import tensorflow as tf
# restore the deep model
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('pretrained/faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12/'))
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
X = graph.get_tensor_by_name('image_tensor:0')
feed_dict ={X: image_raw_feature}
#Now, access the op that we want to run. 
num_detections = graph.get_tensor_by_name('num_detections:0')
detection_scores = graph.get_tensor_by_name('detection_scores:0')
detection_boxes = graph.get_tensor_by_name('detection_boxes:0')
 
x1, x2, x3 = sess.run(
    [num_detections, detection_scores, detection_boxes],
    feed_dict
)

输出x1, x2, x3具有形状4[4, 100][4, 100, 4]。问题是我不知道如何将结果解码为人类可读的标签。我猜对象类别的总数是 100 如 x2 ?但与数据集"打开图像"中描述的内容相比,它似乎非常小。

如何将输出解码为标签?

如faster_rcnn_meta_arch.py中所述,输出张量应具有以下形状:

detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections]
detection_classes: [batch, max_detections]
num_detections: [batch]

此处bacth = 4max_detections = 100包含具有不同置信度分数的所有检测,因此可能需要确定分数阈值以筛选出置信度分数较低的检测。此外,detection_boxes包含按ymin, xmin, ymax, xmax顺序和规范化坐标排列的框编码,您需要获取图像的形状才能获得绝对坐标。

例如,假设您希望所有检测都带有 score > 0.5

final_boxes = []
for i in range(int(num_detections)):
    final_boxes.append(detection_boxes[i, detection_scores[i]>0.5, ])

这将为你提供置信度分数高于 0.5 的检测。