Deeplab 到 TensorRT 的转换

将Deeplab Tensorflow模型转换为TensorRT模型会大大增加推理时间，我在代码中做错了什么？

在这里，我正在从Tensorflow图转换为TensorRT图并保存这个新的TRT模型：

OUTPUT_NAME = ["SemanticPredictions"]
# read Tensorflow frozen graph
with gfile.FastGFile('/frozen_inference_graph.pb', 'rb') as tf_model:
   tf_graphf = tensorflow.GraphDef()
   tf_graphf.ParseFromString(tf_model.read())
# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(input_graph_def=tf_graphf, outputs=OUTPUT_NAME, max_batch_size=2, max_workspace_size_bytes=2 * (10 ** 9), precision_mode="INT8")
# write the TensorRT model to be used later for inference
with gfile.FastGFile("TensorRT_model.pb", 'wb') as f:
   f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")

在另一个脚本中，我再次加载这个 TRT 模型并使用它进行语义分割预测，但它的速度大约慢了 7 到 8 倍！下面是第二个脚本：

with tensorflow.Session(config=tensorflow.ConfigProto(gpu_options=tensorflow.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess:
   img_array = cv2.imread('test.png',1)
   # read TensorRT frozen graph
   with gfile.FastGFile('TensorRT_model.pb', 'rb') as trt_model:
      trt_graph = tensorflow.GraphDef()
      trt_graph.ParseFromString(trt_model.read())
   # obtain the corresponding input-output tensor
   tensorflow.import_graph_def(trt_graph, name='')
   input = sess.graph.get_tensor_by_name('ImageTensor:0')
   output = sess.graph.get_tensor_by_name('SemanticPredictions:0')
   # perform inference
   batch_seg_map = sess.run(output, feed_dict={input: [img_array]})
   seg_map = batch_seg_map[0]
   seg_img = label_to_color_image(seg_map).astype(np.uint8)

任何想法，我应该如何以加快推理的方式正确执行转换？

根据我尝试使用 TRT 转换 deeplab 模型的经验，int8 模式表现不佳，因为该模型中有许多不受支持的操作，因此图形被"分解"成许多小子图，并且只有其中一部分被转换为 TRT。我能够在 FP16 模式下正确转换并以某种方式加快推理速度。

附言如果您仍然想使用 Int8，则不一定需要校准文件，只需要一些可以运行模型进行校准的输入图像。

鉴于您将精度模式设置为 INT8，我认为您正在运行校准算法而不是推理。校准算法比推理慢得多，因为它收集统计信息并设置量化范围。

调用create_inference_graph后，您需要呼叫calib_graph_to_infer_graph。

有关示例，请参阅以下内容：https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L500

我已经使用TF-TRT开发人员指南将我的deeplabv3+模型转换为TensorRT优化的pb图。我正在使用 Jetson Nano 开发工具包来运行我的模型。根据我的经验，我认为您需要检查以下事项：

您的硬件（GPU）是否支持 INT8？就我而言，Jetson nano不支持INT8（图形已转换，但推理花费了更长的时间）。在研究过程中，我发现GPU应该有FP16/FP32张量核心才能按预期运行模型。参考这里
检查您的张量流模型是否支持 INT8/FP16/FP32 精度的操作？对于 deeplabv3+，我在 FP16 和 FP32 优化图形的情况下获得了类似的性能（时间和 IoU）。对于 INT8，校准失败。参考这里要检查支持的操作，请参阅此处

相关内容

最新更新

热门标签：