最近我决定从Tensorflow(GPU变体(的1.14版本迁移到当前的2.0版本。
我目前的设置是:
- Tensorflow (gpu 变体( 2.0
- 库德恩 7.6.4
- 库达 10
- 蟒蛇 3.6
- IDE:Visual Studio 2019
我确实预料到会有一些痛苦,但这让我措手不及。
当我尝试运行我的一个(现已调整的(1.14 项目时,使用 now 问题构建的模型和训练过程开始顺利。只有在第三步后才能完全停止。 同样的项目在 Tensorflow 2.0 的 cpu 变体上运行得很好,但训练所有模型需要几个数量级的时间。
这是我到目前为止尝试过的:
- 更改超参数
- 重新安装库达
- 重新安装张量流
- 重新安装库德恩
- 禁用验证
- 检查路径变量
这些都没有对这个问题有任何帮助。我唯一的线索是警告消息:
Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
我从来没有用过 Tf 1.14,并且有些困惑。 我知道 CUDA 有效,因为我编译并运行了几个 Nvidia 示例。因此,剩下的唯一真正的选项与 Tensorflow 或它如何处理 GPU 有关。
但我不知道如何前进。
会话日志如下:
019-11-27 01:03:57.910895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
C:Program Files (x86)Microsoft Visual StudioSharedPython36_64libsite-packagespandascoreframe.py:4117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
2019-11-27 01:04:02.247959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-11-27 01:04:02.277414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.282378: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.286653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:02.289629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-11-27 01:04:02.295084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:0a:00.0
2019-11-27 01:04:02.299843: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-11-27 01:04:02.303965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-27 01:04:03.043700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 01:04:03.047132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-11-27 01:04:03.049453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-11-27 01:04:03.052642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 154, 64) 896000
_________________________________________________________________
conv1d (Conv1D) (None, 150, 64) 20544
_________________________________________________________________
flatten (Flatten) (None, 9600) 0
_________________________________________________________________
dense (Dense) (None, 300) 2880300
_________________________________________________________________
dense_1 (Dense) (None, 150) 45150
_________________________________________________________________
dense_2 (Dense) (None, 70) 10570
_________________________________________________________________
dense_3 (Dense) (None, 10) 710
_________________________________________________________________
dense_4 (Dense) (None, 2) 22
=================================================================
Total params: 3,853,296
Trainable params: 3,853,296
Non-trainable params: 0
_________________________________________________________________
Train for 10 steps, validate for 50 steps
Epoch 1/40
2019-11-27 01:04:06.199581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-11-27 01:04:06.430358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 01:04:07.180709: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-11-27 01:04:07.425377: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2019-11-27 01:04:07.431736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
1/10 [==>...........................] - ETA: 32s - loss: 0.6933 - accuracy: 0.4375 - categorical_accuracy: 0.4375 - precision: 0.4375 - recall: 0.43752019-11-27 01:04:07.655586: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 148 kernel records, 21 memcpy records.
WARNING: Logging before flag parsing goes to stderr.
W1127 01:04:07.730274 5696 callbacks.py:244] Method (on_train_batch_end) is slow compared to the batch update (0.138531). Check your callbacks.
3/10 [========>.....................] - ETA: 9s - loss: 0.6167 - accuracy: 0.7000 - categorical_accuracy: 0.7000 - precision: 0.7000 - recall: 0.7000
我也受到了同样问题的影响。事实证明,就我而言,问题出在驱动程序上。
我首先尝试使用 CUDA 10 和最新的 NVIDIA 驱动程序的 tensorflow-gpu,并在训练步骤中随机停留,只是为了看到您展示的 ptxas 内容。
接下来,我将 tensorflow 版本从 2.0 更改为 1.15 或 1.14,并使用 Python 版本进行调整,发现没有任何帮助。
卸载驱动程序并安装旧驱动程序 (432.00( 后,问题消失了,尽管我继续看到 ptxas 警告。