多GPU似乎在TensorFlow1.0上不起作用

我正在使用TensorFlow 1.0，并且我开发了一个简单的程序来衡量性能。我有一个愚蠢的模型，如下

def model(example_batch):
    h1 = tf.layers.dense(inputs=example_batch, units=64, activation=tf.nn.relu)
    h2 = tf.layers.dense(inputs=h1, units=2)
    return h2

和一个简单的功能来运行模拟：

def testPerformanceFromMemory(model, iter=1000 num_cores=2):
  example_batch = tf.placeholder(np.float32, shape=(64, 128))
  for core in range(num_cores):
    with tf.device('/gpu:%d'%core):
      prediction = model(example_batch)
  init_op = tf.global_variables_initializer()
  sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
  sess.run(init_op)
  tf.train.start_queue_runners(sess=sess)
  input_array = np.random.random((64,128))
  for step in range(iter):
    myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

如果我运行python脚本，然后运行nvidia-smi命令，我可以看到gpu0的使用率很高，但gpu1是0％用法。

不知道为什么我的示例不在多GPU中运行。

ps如果我从tensorflow存储库中进行ciphar 10示例，它以多pu模式运行。

编辑：正如Mrry所说，我正在覆盖预测，所以我以正确的方式发布：

def testPerformanceFromMemory(model, iter=1000 num_cores=2):
  example_batch = tf.placeholder(np.float32, shape=(64, 128))
  prediction = []
  for core in range(num_cores):
    with tf.device('/gpu:%d'%core):
      prediction.append([model(example_batch)])
  init_op = tf.global_variables_initializer()
  sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
  sess.run(init_op)
  tf.train.start_queue_runners(sess=sess)
  input_array = np.random.random((64,128))
  for step in range(iter):
    myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

查看您的程序，在不同的GPU设备上创建了几个并行子图（通常称为"塔"），但是在第一个for循环的每次迭代中覆盖prediction张量：

for core in range(num_cores):
  with tf.device('/gpu:%d'%core):
    prediction = model(example_batch)
# ...
for step in range(iter):
  myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

结果，当您调用sess.run(prediction, ...)时，您只会运行在第一个for循环的最终迭代中创建的子图，该循环仅在一个GPU上运行。

相关内容

最新更新

热门标签：