"Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED"应该开箱即用的项目



https://github.com/zzh8829/yolov3-tf2 是项目。我已经安装了我认为的所有正确版本。

谷歌告诉我,这可能是一个低VRAM问题,但我仍在寻找其他原因。 请帮忙。 我正在使用:

Windows 10(不要说"有你的问题"我需要它(

cuDNN 7.4.6

库达 10.0

张量流 2.0.0

蟒蛇 3.6

我有一个 gtx1660 超级 6GB VRAM,在 7GB RAM 上配备锐龙 7 2700x。我将在几天内获得 gt1080 8gig,我将添加到第二个 PCI 插槽。

错误如下:

2019-11-30 06:31:26.167368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll                                
2019-11-30 06:31:27.843742: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED                                      
2019-11-30 06:31:27.853725: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED                                      
Traceback (most recent call last):                                                                                                                                           
File ".convert.py", line 34, in <module>                                                                                                                                  
app.run(main)                                                                                                                                                            
File "C:Program FilesPython36libsite-packagesabslapp.py", line 299, in run                                                                                           
_run_main(main, args)                                                                                                                                                    
File "C:Program FilesPython36libsite-packagesabslapp.py", line 250, in _run_main                                                                                     
sys.exit(main(argv))                                                                                                                                                     
File ".convert.py", line 25, in main                                                                                                                                      
output = yolo(img)                                                                                                                                                       
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginebase_layer.py", line 891, in __call__                                                
outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginenetwork.py", line 708, in call                                                       
convert_kwargs_to_constants=base_layer_utils.call_context().saving)                                                                                                      
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginenetwork.py", line 860, in _run_internal_graph                                        
output_tensors = layer(computed_tensors, **kwargs)                                                                                                                       
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginebase_layer.py", line 891, in __call__                                                
outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginenetwork.py", line 708, in call                                                       
convert_kwargs_to_constants=base_layer_utils.call_context().saving)                                                                                                      
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginenetwork.py", line 860, in _run_internal_graph                                        
output_tensors = layer(computed_tensors, **kwargs)                                                                                                                       
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkerasenginebase_layer.py", line 891, in __call__                                                
outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
File "C:Program FilesPython36libsite-packagestensorflow_corepythonkeraslayersconvolutional.py", line 197, in call                                                 
outputs = self._convolution_op(inputs, self.kernel)                                                                                                                      
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsnn_ops.py", line 1134, in __call__                                                            
return self.conv_op(inp, filter)                                                                                                                                         
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsnn_ops.py", line 639, in __call__                                                             
return self.call(inp, filter)                                                                                                                                            
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsnn_ops.py", line 238, in __call__                                                             
name=self.name)                                                                                                                                                          
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsnn_ops.py", line 2010, in conv2d                                                              
name=name)                                                                                                                                                               
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsgen_nn_ops.py", line 1031, in conv2d                                                          
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)                                                                                                       
File "C:Program FilesPython36libsite-packagestensorflow_corepythonopsgen_nn_ops.py", line 1130, in conv2d_eager_fallback                                           
ctx=_ctx, name=name)                                                                                                                                                     
File "C:Program FilesPython36libsite-packagestensorflow_corepythoneagerexecute.py", line 67, in quick_execute                                                      
six.raise_from(core._status_to_exception(e.code, message), None)                                                                                                         
File "<string>", line 3, in raise_from                                                                                                                                     
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a wa
rning log message was printed above. [Op:Conv2D]

我在同一个存储库中遇到了同样的问题。

对我和我的团队有用的解决方案是将 cuDNN 升级到 7.5 或更高版本(而不是您的 7.4(。

更新说明可以在 Nvidia 的网站上找到:
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

发生这种情况可能有几个原因。

(1(如您所提到的,这可能是内存问题,您可以尝试通过向 GPU 分配较少的内存并查看该错误是否仍然发生来验证。您可以在 TF 2.0 中执行此操作,如下所示 (https://github.com/tensorflow/tensorflow/issues/25138#issuecomment-484428798(:

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.75)
tf.config.gpu.set_per_process_memory_growth(True)
# your model creation, etc.
model = MyModel(...)

如果您> GPU (https://github.com/zzh8829/yolov3-tf2/blob/master/train.py#L46-L47(,我看到您正在运行的代码会设置动态内存增长,但由于您只有 1 个 GPU,那么它可能只是尝试在开始时分配所有内存 (>90%(。

(2(一些用户似乎在Windows上遇到过这种情况,当时有其他TensorFlow或类似的进程同时使用GPU,无论是由您还是其他用户使用:https://stackoverflow.com/a/53707323/10993413

(3(与往常一样,请确保您的 PATH 变量正确。有时,如果您尝试了多个安装并且没有正确清理,则 PATH 可能会首先找到错误的版本并导致问题。如果将新路径添加到 PATH 的开头,则应首先找到它们:https://www.tensorflow.org/install/gpu#windows_setup

(4(如@xenotecc所述,您可以尝试升级到较新版本的 CUDNN,尽管我不确定这会有所帮助,因为您的配置在 TF 文档中被列为受支持:https://www.tensorflow.org/install/source#gpu。如果这确实解决了它,那么它毕竟可能是 PATH 问题,因为您可能会在安装较新版本后更新 PATH。

得到同样的错误,并通过下面解决:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)])

(使用 GTX 1660、6G 内存、张量流 2.0.1(

简单的修复: 在"convert.py"中的"导入"下插入此行

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

这将在加载权重时忽略您的 GPU。

相关内容

最新更新