如何在Windows上为RTX 3070设置Tensorflow ?

我使用Windows 10，并尝试设置tesnsorflow脚本与我的新RTX 3070 GPU一起工作。以前我在GTX 980上使用它。

TensorFlow install from binary (pip3 install tensorflow)
尝试了最新的稳定版本2.4.0-49-g85c8b2a817f 2.4.1，但也每晚(见下)
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
CUDA/cuDNN版本:cuda_11.2.0_460.89_win10 cuDNN -11.1-v8.0.5.39
GPU型号和内存:似乎被TF - GeForce RTX 3070正确识别computeCapability: 8.6 coreClock: 1.725GHz cocount: 46 devicemmemorysize: 8.00GiB devicemmemorybandwidth: 417.29GiB/s

当前行为

出现以下错误:

2021-01-25 21:36:01.042433: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/500
2021-01-25 21:36:03.304809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-25 21:36:03.880223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-25 21:36:03.911531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-25 21:36:04.515409: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-25 21:36:04.515498: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-25 21:36:04.515607: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:UsersAleksander.IntelliJIdea2018.3configpluginspythonhelperspydev_pydev_bundlepydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
File "C:UsersAleksander.IntelliJIdea2018.3configpluginspythonhelperspydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 110, in <module>
callbacks=[checkpoint, tensorboard])
File "C:Workspace_GpwScanstubstensorflowpythonkerasenginetraining.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:Workspace_GpwScanstubstensorflowpythoneagerdef_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:Workspace_GpwScanstubstensorflowpythoneagerdef_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 2943, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 560, in call
ctx=ctx)
File "C:Workspace_GpwScanstubstensorflowpythoneagerexecute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:    Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_8782]
Function call stack:
train_function -> train_function -> train_function

tf_2.4.1_issue_on_3070.txt

也尝试了最新的nightly 2.4.02.5.0.dev20210125结束错误:

2021-01-25 21:31:05.429799: E tensorflow/stream_executor/dnn.cc:618] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1975): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-01-25 21:31:05.430291: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1926 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:UsersAleksander.IntelliJIdea2018.3configpluginspythonhelperspydev_pydev_bundlepydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
File "C:UsersAleksander.IntelliJIdea2018.3configpluginspythonhelperspydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 108, in <module>
callbacks=[checkpoint, tensorboard])
File "C:Workspace_GpwScanstubstensorflowpythonkerasenginetraining.py", line 1134, in fit
tmp_logs = self.train_function(iterator)
File "C:Workspace_GpwScanstubstensorflowpythoneagerdef_function.py", line 818, in __call__
result = self._call(*args, **kwds)
File "C:Workspace_GpwScanstubstensorflowpythoneagerdef_function.py", line 846, in _call
return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 2994, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 1939, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:Workspace_GpwScanstubstensorflowpythoneagerfunction.py", line 569, in call
ctx=ctx)
File "C:Workspace_GpwScanstubstensorflowpythoneagerexecute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError:    Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
[[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
[[Adam/gradients/PartitionedCall_2]] [Op:__inference_train_function_8936]
Function call stack:
train_function -> train_function -> train_function

tf_nightly_issue_on_3070.txt

重现问题的独立代码

import datetime
import os
import pandas as pd
from numpy import reshape
import tensorflow as tf
EPOCHS = 500
BATCH_SIZE = 256
TEST_SET_RATIO = 0.2
LEARNING_RATE = 0.001
DECAY = 3e-5
LOSS_FUNC = 'categorical_crossentropy'
DROPOUT = 0.2
OUTPUT_PATH = "e:\ml"
RNN_SEQ_LEN = 128  # number of RNN/LSTM sequence features
L_AMOUNT = 2  # number of labels
MIN_ACC_TO_SAVE_MODEL = 0.6

def create_model():
new_model = tf.keras.models.Sequential()
# NETWORK INPUT
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, input_shape=TR_FEATURES.shape[1:], return_sequences=True))
new_model.add(tf.keras.layers.Dropout(DROPOUT))
new_model.add(tf.keras.layers.BatchNormalization())
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, return_sequences=True))
new_model.add(tf.keras.layers.Dropout(DROPOUT / 2))
new_model.add(tf.keras.layers.BatchNormalization())
new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN))
new_model.add(tf.keras.layers.Dropout(DROPOUT))
new_model.add(tf.keras.layers.BatchNormalization())
# NETWORK OUTPUT
new_model.add(tf.keras.layers.Dense(L_AMOUNT, activation=tf.keras.activations.softmax))
opt = tf.keras.optimizers.Adam(LEARNING_RATE, decay=DECAY)
new_model.compile(optimizer=opt,
loss=LOSS_FUNC,
metrics=['accuracy'])
print(new_model.summary())
return new_model

class CustomModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
def __init__(self, fp, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', **kwargs):
super().__init__(fp, monitor, verbose, save_best_only, save_weights_only, mode, save_freq, **kwargs)
def on_epoch_end(self, epoch, logs=None):
print("n-------------------------------------------------------------------------------------------------------")
print(f"epoch: {epoch}, training_acc: {round(float(logs['accuracy']), 4)}, validation_acc: {round(float(logs['val_accuracy']), 4)}")
print("-------------------------------------------------------------------------------------------------------n")
if MIN_ACC_TO_SAVE_MODEL <= logs['accuracy']:
super().on_epoch_end(epoch, logs)

if __name__ == '__main__':
data_filename = 'train_2020-02-07_pp_x128_3_2_all.csv'
print("Loading data file: %s" % data_filename)
dataset = pd.read_csv(data_filename, delimiter=',', header=None)
dataset = dataset.drop(columns=[0, 1, 2, 3, 4, 5, 6]).values  # drop columns with additional information
test_set_size = int(len(dataset) * TEST_SET_RATIO)
print("Test set split at: %d" % test_set_size)
train_data = dataset[:-test_set_size]
test_data = dataset[-test_set_size:]  # use most recent data for validation (extract before shuffle)
TR_F = train_data[:, 0:RNN_SEQ_LEN]
TS_F = test_data[:, 0:RNN_SEQ_LEN]
TR_L = train_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
TS_L = test_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
TR_FEATURES = reshape(TR_F, (len(TR_F), RNN_SEQ_LEN, 1))
TS_FEATURES = reshape(TS_F, (len(TS_F), RNN_SEQ_LEN, 1))
model = create_model()
TRAINING_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
model_name = "sscce_%s" % TRAINING_TIMESTAMP
os.mkdir("%s\models\%s" % (OUTPUT_PATH, model_name))
filepath = "%s\models\%s\%s--{epoch:02d}-{val_accuracy:.3f}.model" % (OUTPUT_PATH, model_name, model_name)
checkpoint = CustomModelCheckpoint(filepath,
monitor='val_accuracy',
verbose=1,
save_best_only=True,
mode='max')
log_dir = "%s\logs\fit\%s.model" % (OUTPUT_PATH, model_name)
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=0)
model.fit(x=TR_FEATURES,
y=TR_L,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
shuffle=True,
validation_data=(TS_FEATURES, TS_L),
callbacks=[checkpoint, tensorboard])

数据文件示例:input_data.zip

其他信息/日志

还提供了CUDA 11.0安装的路径，因为没有它会出现如下错误:

2021-01-25 21:44:15.989317: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

完整的系统路径:

Path=C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.2bin;C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.2libnvvp;C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0bin;C:cudnn-11.1-v8.0.5.39bin;C:Python36Scripts;C:Python36;C:ProgramDataDockerDesktopversion-bin;C:Program FilesDockerDockerResourcesbin;c:Javajdk1.8.0_144_x86;C:gradle-6.0.1bin;C:SVNbin;C:MinGWbin;C:WinAVR-20100110;c:avrdude;c:Androidsdkplatform-tools;C:adb;C:TortoiseGitbin;C:Git4Windowscmd;c:sqlite-tools-win32-x86-3130000;C:WINDOWSSystem32;C:WINDOWS;C:WINDOWSSystem32wbem;C:WINDOWSSystem32WindowsPowerShellv1.0;C:Program Files (x86)Bitvise SSH Client;C:Program Files (x86)Windows LiveShared;C:WINDOWSsystem32;C:WINDOWSSystem32Wbem;C:WINDOWSSystem32OpenSSH;C:WINDOWSsystem32;C:WINDOWS;C:WINDOWSSystem32Wbem;C:WINDOWSSystem32WindowsPowerShellv1.0;C:WINDOWSSystem32OpenSSH;C:Program Files (x86)NVIDIA CorporationPhysXCommon;C:Program FilesNVIDIA CorporationNsight Compute 2020.3.0;C:Program FilesNVIDIA CorporationNVIDIA NvDLISR

我正在尝试不同的cuda/cudnn/tensorflow的组合，只是为了它，但实际上只有cuda_11.2.0_460.89_win10带有win nvidia GPU的高版本，足以支持RTX 30xx系列。仍然没有专门为CUDA 11.2设计的cudnn版本…也许这是一个问题…

你知道怎么让它一起工作吗?

我已经回滚到CUDA 11.0和匹配CUDNN 8.0.2与tensorflow 2.4.1只是为了再次检查它和这个组合

cudnn-11.0-windows-x64-v8.0.2.39.zip
cuda_11.0.2_451.48_win10.exe
latest stable tensorflow 2.4.1
updated nVidia GPU drivers to 461.40 as 451.48 packaged with above CUDA installer won't work with rtx 3070

…给:

2021-02-04 19:36:59.700433: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-02-04 19:36:59.700523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-02-04 19:36:59.700630: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.

最终它开始使用最新的cudnn-11.2-windows-x64-v8.1.0.77.zip最近与2.5-nightly一起发布，但显然只与cuda_11.2.0_460.89_win10.exe一起发布。

我有一个类似的问题，以前使用过TF 2.4, CUDA 11.0和CuDNN 8.0。我不知道为什么一个简单的网络可以在这种配置下工作，而更复杂的网络却不能。显然我的简单网络没有使用CuDNN?

无论如何，升级到TF 2.5, CUDA 11.2和CuDNN到8.1后一切正常。将来最好从tensorflow.com上检查兼容的库版本。

相关内容

最新更新

热门标签：