如何在Tensorflow中进行小批量选择性反向传播

最近，我正在进行一个项目">通过使用Tensorflow中的LSTM，从物体过去的轨迹预测物体的未来轨迹。"(这里，轨迹是指2D位置的序列。(

LSTM的输入当然是"过去的轨迹"，输出是"未来的轨迹"。

小批量的大小在训练时是固定的。但是，小批量中过去轨迹的数量可能不同。例如，让小批量大小为10。如果我在当前训练迭代中只有4个过去的轨迹，那么小批量中的10个轨迹中有6个被填充为零值。

在计算反向传播的损失时，我让6的损失为零，这样只有4对反向传播有贡献。

我关心的问题是…Tensorflow似乎仍然计算6的梯度，即使它们的损失为零。因此，即使使用了相同的训练数据，随着小批量大小的增加，训练速度也会变慢。

我在计算损失时也使用了tf.where函数。然而，训练时间并没有减少。

如何减少训练时间？

在这里，我附上了我的伪代码用于训练。

# For each frame in a sequence
for f in range(pred_length):
# For each element in a batch
for b in range(batch_size):

with tf.variable_scope("rnnlm") as scope:
if (f > 0 or b > 0):
scope.reuse_variables()
# for each pedestrian in an element
for p in range(MNP):
# ground-truth position
cur_gt_pose = ...
# loss mask
loss_mask_ped = ... # '1' or '0'
# go through RNN decoder
output_states_dec_list[b][p], zero_states_dec_list[b][p] = cell_dec(cur_embed_frm_dec,
              zero_states_dec_list[b][p])
# fully connected layer for output
cur_pred_pose_dec = tf.nn.xw_plus_b(output_states_dec_list[b][p], output_wd, output_bd)
# go through embedding function for the next input
prev_embed_frms_dec_list[b][p] = tf.reshape(tf.nn.relu(tf.nn.xw_plus_b(cur_pred_pose_dec, embedding_wd, embedding_bd)), shape=(1, rnn_size))
# calculate MSE loss
mse_loss = tf.reduce_sum(tf.pow(tf.subtract(cur_pred_pose_dec, cur_gt_pose_dec), 2.0))
# only valid ped's traj contributes to the loss
self.loss += tf.multiply(mse_loss, loss_mask_ped)

我想你正在寻找函数tf.stop_gradient。使用它，你可以做一些类似tf.where(loss_mask, tensor, tf.stop_gradient(tensor))的事情来实现所需的结果，假设维度是正确的。

然而，看起来这可能不是你的问题。似乎对于数据集中的每一项，您都在定义新的图节点。这不是TensorFlow的工作方式，无论批量大小，都应该只有一个预先构建的图来执行一些固定的功能。您绝对不应该为批处理中的每个元素定义新节点，因为这不能有效地利用并行性。

相关内容

最新更新

热门标签：