我使用Yolov3由Ultralytics (PyTorch)来检测视频中奶牛的行为。Yolov3被训练来检测视频中的每一头奶牛。使用边界框的X和Y坐标裁剪奶牛的每张图像。然后每张图像经过另一个模型来确定他们是坐着还是站着。第二个模型也是用我们自己的数据集训练的。第二个模型使用Tensorflow,它是一个非常简单的InceptionV3模型。
然而,每当我尝试加载两个模型时,我都会得到以下错误
RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 16.00 GiB total capacity; 427.42 MiB already allocated; 7.50 MiB free; 448.00 MiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
如果第二个模型没有加载,那么yolov3 (PyTorch)运行没有任何问题,它甚至不使用整个16GB的VRAM。yolov3是否保留了整个VRAM,而没有为基于tensorflow的Inceptionv3留下任何东西?如果是,无论如何强迫火炬保持2 GB VRAM一边?
完整的代码输出在这里
>> python detectv2.py --weights best.pt --source outch06_20181022073801_0_10.avi
2022-06-01 16:02:40.975544: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-01 16:02:41.342394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14123 MB memory: -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:65:00.0, compute capability: 7.5
detectv2: weights=['best.pt'], source=outch06_20181022073801_0_10.avi, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runsdetect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
Empty DataFrame
Columns: []
Index: []
YOLOv3 2022-5-16 torch 1.11.0 CUDA:0 (Quadro RTX 5000, 16384MiB)
Fusing layers...
Model Summary: 269 layers, 62546518 parameters, 0 gradients
Traceback (most recent call last):
File "detectv2.py", line 462, in <module>
main(opt)
File "detectv2.py", line 457, in main
run(**vars(opt))
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchautogradgrad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "detectv2.py", line 221, in run
model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.model.parameters()))) # warmup
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userssouravyolov3-mastermodelscommon.py", line 357, in forward
y = self.model(im) if self.jit else self.model(im, augment=augment, visualize=visualize)
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userssouravyolov3-mastermodelsyolo.py", line 127, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "C:Userssouravyolov3-mastermodelsyolo.py", line 150, in _forward_once
x = m(x) # run
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userssouravyolov3-mastermodelscommon.py", line 48, in forward_fuse
return self.act(self.conv(x))
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesconv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:UserssouravAnaconda3envsyl37libsite-packagestorchnnmodulesconv.py", line 444, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 16.00 GiB total capacity; 427.42 MiB already allocated; 7.50 MiB free; 448.00 MiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
PyTorch并没有像我想象的那样占用GPU。事实正好相反。我试图首先启动TensorFlow模型,它占用了整个内存,没有为PyTorch留下任何东西。
解决方案在这里
For tensorflow 2.2+
对于单个GPU
import tensorflow as tf
gpu = tf.config.experimental.list_physical_devices('GPU')[0]
tf.config.experimental.set_memory_growth(gpu, True)
详细信息可以在这篇文章和这个文档中找到。