如何从数据生成器转换为用于多 GPU 训练的张量流队列. cifar10_multi_gpu_train.py示例

我有一个用于训练CNN的数据生成器并且工作正常。现在我想通过以下cifar10_multi_gpu_train.py来加快在 2 个 GPU(在 1 台 PC 上(上的训练速度。(https://www.tensorflow.org/programmers_guide/threading_and_queues(

问题：

1(如何将数据生成器转换为队列？数据项：(图像文件目录，输出(。整个数据集：数据项列表。批量数据集：部分完整数据集。如何将其放入张量中，如下所示：

batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * FLAGS.num_gpus)

2( 队列的内容是什么？

1.a(队列采用整个数据集还是单个批处理数据集？

1.b(在我看来，在 cifar10 样本中，队列是 1 批。但是，它如何在所有数据集中循环？

1.c(如果队列占用整个数据集，那么 GPU 的每个线程中的数据是什么？在这种情况下，我不确定我是否理解并发 GPU 训练是如何可能的，因为用于计算损失和梯度的每个数据集都依赖于相同的模型状态。但是，只有在完成最后一个数据集以修改模型权重之后，才有可能计算下一个数据集的下一个损失+梯度。

一种可能有效的方法是：

构建列表

image_list = [("file%d" % i) for i in range(100)]
label_list = read_label_list_from_disk(path)

将image_list和label_list转换为张量

制作人

image_filename_queue, label_queue = 
tf.slice_input_producer([image_tensor, label_tensor], ..)

读者

reader =tf.WholeFileReader()
key, value = reader.read(image_filename_queue)
images = tf.image.decode_png(value)`

配料

image_batch, label_batch =
tf.train.batch([images,labels_queue],batch_size=batch_size)

预取

batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[image_batch, label_batch], capacity=2*gpus)

希望这也解释了Tensorflow中队列的概念。

相关内容

最新更新

热门标签：