
我正在训练一个Keras模型,我需要切换设备以获得更大的功率(从Windows i3核心到Ubuntu i7(。问题是,我的代码在Windows上运行良好,但显示了以下错误,甚至在运行第一个epoch之前就停止了计算。以下是完整输出:

/home/willylutz/PycharmProjects/hiv_image_analysis/venv/bin/python /home/willylutz/PycharmProjects/hiv_image_analysis/main.py 
2022-09-19 09:36:52.801711: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-19 09:36:52.956260: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-19 09:36:53.502748: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502794: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502800: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Found 480 files belonging to 2 classes.
Using 384 files for training.
2022-09-19 09:37:00.058171: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-09-19 09:37:00.058202: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (zhang): /proc/driver/nvidia/version does not exist
2022-09-19 09:37:00.067388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Found 480 files belonging to 2 classes.
Using 96 files for validation.
['INF', 'NI']
2022-09-19 09:37:10.149236: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:390] Filling up shuffle buffer (this may take a while): 373 of 512
2022-09-19 09:37:10.197351: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:415] Shuffle buffer filled.
(64, 1024, 1024, 3)
2022-09-19 09:37:16.377367: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4294967296 exceeds 10% of free system memory.
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)


keras/tensorflow 2.9和2.10中有一个错误,它导致预处理层(如重新缩放(非常慢:https://github.com/tensorflow/tensorflow/issues/56242

尝试不使用重缩放层的模型。如果你想使用这个或类似的预处理层,你应该使用TF 2.8.3或更早的版本。


我遇到了同样的问题。我通过将源代码从keras Rescaling层复制到我的代码来解决它;拥有";正在重定类别。我把math_ops.cast((改成了tf.cast((,它就像一个符咒。没有任何警告,代码运行得更快。

class Rescaling(tf.keras.layers.Layer):
"""Multiply inputs by `scale` and adds `offset`.
For instance:
1. To rescale an input in the `[0, 255]` range
to be in the `[0, 1]` range, you would pass `scale=1./255`.
2. To rescale an input in the `[0, 255]` range to be in the `[-1, 1]` 
you would pass `scale=1./127.5, offset=-1`.
The rescaling is applied both during training and inference.
Input shape:
Output shape:
Same as input.
scale: Float, the scale to apply to the inputs.
offset: Float, the offset to apply to the inputs.
name: A string, the name of the layer.
def __init__(self, scale, offset=0., name=None, **kwargs):
self.scale = scale
self.offset = offset
super(Rescaling, self).__init__(name=name, **kwargs)
def call(self, inputs):
dtype = self._compute_dtype
scale = tf.cast(self.scale, dtype)
offset = tf.cast(self.offset, dtype)
return tf.cast(inputs, dtype) * scale + offset
def compute_output_shape(self, input_shape):
return input_shape
def get_config(self):
config = {
'scale': self.scale,
'offset': self.offset,
base_config = super(Rescaling, self).get_config()
return dict(list(base_config.items()) + list(config.items()))



简而言之,TensorRT对内部模型图进行优化,这意味着模型执行得更快。通常,首先将tensorflow模型保存到onnx,然后将模型从onnx转换为TensorRT。最近,TF和NVidia的进步使得在推理模式下运行时(在带有NVidia gpu的计算机上运行时(可以利用这一点更快地执行tensorflow模型。

以前,您需要从源代码构建TF以启用TensorRT集成。它似乎在最新版本中,现在默认启用(我想在降级到tf v2.8之前,您使用的是tf v2.11(




