在启用GPU的情况下运行推理时,Tenserflow挂起



我是AI和TensorFlow的新手,我正在尝试在windows上使用TensorFlow对象检测API
我目前的目标是在视频流中进行实时人体检测
为此,我修改了TensorFlow模型花园中的一个python示例(https://github.com/tensorflow/models)
目前,它检测所有对象(不仅仅是人类(,并使用opencv显示边界框。

当我禁用GPU时,它工作正常(os.environg["CUDA_VISIBLE_DEVICES"]="-1"(
但当我启用GPU并启动脚本时,它将挂在第一帧上。

输出:

2020-04-22 16:00:53.597492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:56.942141: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-22 16:00:56.976635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:56.989129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:57.000622: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:57.012247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:57.020575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:57.031536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:57.042564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:57.066289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:57.075760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:00:59.239211: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-22 16:00:59.256577: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f3f73cd670 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:00:59.264241: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-22 16:00:59.272280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 74.65GiB/s
2020-04-22 16:00:59.281409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-22 16:00:59.288204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:00:59.293112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-22 16:00:59.298222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-22 16:00:59.305446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-22 16:00:59.310590: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-22 16:00:59.316250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:00:59.324588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-04-22 16:01:00.831569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-22 16:01:00.839147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-04-22 16:01:00.842279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-04-22 16:01:00.846140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-04-22 16:01:00.865546: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f39174cba0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-22 16:01:00.873656: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
2020-04-22 16:01:10.876733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-22 16:01:11.814909: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-04-22 16:01:11.852909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-22 16:01:12.149312: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.179484: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.209036: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.237205: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.266147: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.295182: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.325645: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.357550: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.405332: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-04-22 16:01:12.436336: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

这是我正在使用的代码:

#!/usr/bin/env python
# coding: utf-8
import os
import pathlib
if "models" in pathlib.Path.cwd().parts:
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from PIL import Image
from IPython.display import display
import cv2 
cap = cv2.VideoCapture(1)
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1
# Patch the location of gfile
tf.gfile = tf.io.gfile
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name, 
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
model = model.signatures['serving_default']
return model
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
# model_name= 'faster_rcnn_inception_v2_coco_2017_11_08';
detection_model = load_model(model_name)
print(detection_model.inputs)
detection_model.output_dtypes
detection_model.output_shapes
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference (it hangs here)
output_dict = model(input_tensor)
# All outputs are batches tensors.
# Convert to numpy arrays, and take index [0] to remove the batch dimension.
# We're only interested in the first num_detections.
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy() 
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Reframe the the bbox mask to the image size.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(output_dict['detection_masks'], output_dict['detection_boxes'],image.shape[0], image.shape[1])      
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
return output_dict
def show_inference(model):
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
ret, image_np = cap.read()
#percent by which the image is resized
#scale_percent = 30
#calculate the 50 percent of original dimensions
#width = int(image_np.shape[1] * scale_percent / 100)
#height = int(image_np.shape[0] * scale_percent / 100)
# dsize
#dsize = (width, height)
# resize image
#image_np = cv2.resize(image_np, dsize)
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('object detection', cv2.resize(image_np, (800,600)))
while True:
show_inference(detection_model)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break

我安装了以下版本:
Python:3.764位
Tensorflow:2.2.0-rc3
Cuda:10.1
cudn7.6.5.32

我在两台不同的机器上尝试过:
机器1:
-CPU:i7-6700HQ
-RAM:16 GB
-GUP:NVIDIA GeForce GTX 960M

设备2:
-CPU:i5-6400
-LAM:16 GB
-GGP:NVIDIA GeForce GTX960

我不知道如何调试。我在两台不同的机器上尝试了相同的代码,结果几乎相同<唯一不同的是它挂起来的时间。机器1会立即挂起,机器2大约需要30秒>
机器2能够处理视频并检测物体直到挂起

我查看了"分配器(GPU_0_bfc(内存不足"警告
我尝试了一些限制可用GPU内存大小的选项,但这没有帮助

还有多个帖子建议减少批量大小
我的解释是,只有在训练自己的模型时,这才有帮助
因为我使用的是预先训练好的模型,所以这不适用

我还尝试使用不同的型号:ssd_mobilenet_v1_coco_2017_11_17和faster_rcnn_inception_v2_coco_2017_1 1_08。两种模型的结果相同。

我最后一次尝试是在处理之前缩小图像大小。这也没有帮助

如有任何帮助,我们将不胜感激

更新
我还在RTX2070超级GPU上尝试过。没有关于内存分配的警告。这也是无法完成单一推理的。为了完整起见,这是控制台输出[在运行推理之前打印文本"推理开始"。如果推理完成,它将打印"推理结束"]:

2020-04-24 11:30:16.579805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.916146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-24 11:30:18.941805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.946134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.951172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.954809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.957258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.961662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.965553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.978671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:18.980998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:18.982226: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-24 11:30:18.984167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-24 11:30:18.987291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-24 11:30:18.988809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-24 11:30:18.990303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-24 11:30:18.991792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-24 11:30:18.993320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-24 11:30:18.996960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-24 11:30:18.998497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:19.000191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-24 11:30:19.430864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-24 11:30:19.433076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-24 11:30:19.434566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-24 11:30:19.436400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6281 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
inference start
2020-04-24 11:30:24.728554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-24 11:30:25.608426: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-24 11:30:25.625904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll

更新2
当Eagle模式被禁用时,一切都运行良好(即使在GPU上(,但我无法检索找到的对象
我尝试的下一件事是用会话运行它(我想像TensorFlow 1(。在这里,函数session.run((在GPU上无限期地阻塞。同样,在CPU上,它工作得很好。

如果您正在使用GPU,请尝试安装tensorflow GPU。你使用的tensorflow似乎是基于支持GPU的文档,但你可以尝试并指定它是隐式的。首先在python虚拟环境中尝试此操作。

pip uninstall tensorflow

卸载tensorflow gpu:(即使你不确定是否安装了它,也要确保运行它(

pip uninstall tensorflow-gpu

安装特定的tensorflow gpu版本:

pip install tensorflow-gpu==2.0.0

最新更新