我在使用新的tf.data tensorflow类实现输入管道的问题。
特别是,当我将卷积操作包括到预处理中时 - 我使用map
方法添加到管道中 - 我得到以下错误
tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now.
[[{{node conv_debug}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](conv_debug-0-TransposeNHWCToNCHW-LayoutOptimizer, ArithmeticOptimizer/FoldMultiplyIntoConv_scaled_conv_debug_Const)]]
当我排除管道中的卷积时,一切都按预期工作。
i附加在重现问题所需的最小代码下方。
用3种配置测试:
- Tensorflow 1.12.0,Cuda 10.0,Cudnn 7.4.1,出现了错误。
- TensorFlow 1.11.0,Cuda 9.0,Cudnn 7.3.1,出现了错误。
- TensorFlow 1.8.0,Cuda 8.0,Cudnn 6.0,它可以使用。
我做错了吗?还是CUDA/CUDNN相关的问题?
谢谢!
import numpy as np
import tensorflow as tf
image_height, image_width = 100, 200
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def serialize_to_record(record_name, label, image):
"""Create a data record and store it"""
writer = tf.python_io.TFRecordWriter(record_name)
image_raw = image.tostring()
label_raw = label
sample = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'label_raw': _bytes_feature(label_raw)}))
writer.write(sample.SerializeToString())
writer.close()
return
def _dataset_parser(record):
"""Read and deserialize a tensorflow record"""
parsed = tf.parse_single_example(record,
features={'image_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string)})
image_ = tf.decode_raw(parsed['image_raw'], tf.uint8)
image_.set_shape(image_height * image_width * 3)
image_ = tf.reshape(image_, (image_height, image_width, 3))
image = tf.cast(image_, tf.float32) / 255.0
label = parsed['label_raw']
return {'image': image, 'label': label}
def _dataset_preprocessor(datum):
"""dummy preprocessor consisting of a convolution with a random kernel"""
image = datum['image']
kernel = np.random.rand(5, 5, 3, 3)
kernel_tf = tf.constant(kernel, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
image = tf.nn.conv2d(image, kernel_tf, [1, 1, 1, 1], padding='SAME', name='conv_debug')
image = tf.squeeze(image, axis=0)
datum['image'] = image
return datum
def _dataset_operator(record):
"""define a sequence of operation to run on the dataset"""
datum = _dataset_parser(record)
datum = _dataset_preprocessor(datum)
return datum
def _dataset_operator_noconv(record):
"""define a sequence of operation to run on the dataset"""
datum = _dataset_parser(record)
return datum
if __name__ == '__main__':
# create a random tensor
image = (255.0 * np.random.rand(image_height, image_width, 3)).astype(np.uint8)
record_path = 'example.tfrecord'
# store a tf record to disk
serialize_to_record(record_path, label='example', image=image)
# build a dummy dataset of copies of the generated image
N = 32
dataset_filenames = [record_path for n in range(N)]
dataset = tf.data.TFRecordDataset(dataset_filenames)
# add parser and preprocessor to the pipeline
include_convolution_to_pipeline = True
if include_convolution_to_pipeline:
dataset = dataset.map(_dataset_operator)
else:
dataset = dataset.map(_dataset_operator_noconv)
# complete pipeline for iteratively visiting the dataset in batches of 8 samples
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(8)
dataset = dataset.repeat()
iterator = dataset.make_initializable_iterator()
next_data = iterator.get_next()
# init session and go for the first batch
sess = tf.Session()
sess.run(iterator.initializer)
next_data_ = sess.run(next_data)
print('***')
作为错误消息状态,卷积操作需要NCHW数据格式。无论您想要哪种数据格式,它仍然需要batch_size作为维度之一。但是您正在尝试在批处理之前应用地图功能。通常不是标准订单,但是如果您需要卷积,则需要在批处理后应用地图功能。
dataset = dataset.map(_dataset_operator)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(8)
dataset = dataset.map(_dataset_operator)
dataset = dataset.repeat()
它是Tensorflow的布局优化器问题。
TensorFlow" MAP"函数在CPU中执行图形并将张量放入地图中,否则将布局优化器混淆。
放置tf.device("/cpu:0")在映射函数内部创建张量时,请解决布局优化器的混淆。另一个选择是禁用可能在额外训练时间内花费的布局优化器(可能不优化整个图形布局以执行" MAP"阶段是不可行的)。
关于此问题已经有一个空旷的问题:
https://github.com/tensorflow/tensorflow/issues/26411
由于这是一种解决方法,我认为可能会出现更强大的解决方案(在GPU中执行"映射"张量,布局优化器等修复程序等)可能会出现在TF的下一个版本中。但是就目前而言,建议的解决方法解决了我的问题,而无需麻烦任何布局DeOpimization问题。