如何修改 Tensorflow Sequence2Sequence 模型以实现双向 LSTM 而不是单向 LSTM?

>参考这篇文章以了解问题的背景：默认情况下，TensorFlow embedding_attention_seq2seq 方法是否实现了双向 RNN 编码器？

我正在使用相同的模型，并希望用双向层替换单向 LSTM 层。我意识到我必须使用 static_bidirectional_rnn 而不是 static_rnn，但由于张量形状中的一些不匹配，我遇到了错误。

我替换了以下行：

encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)

如下行：

encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype)

这给了我以下错误：

InvalidArgumentError (参见上面的回溯)：不兼容的形状： [32,5,1,256] 与 [16,1,1,256] [[节点：梯度/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/广播梯度Args = BroadcastGradientArgs[T=DT_INT32， _device="/job：localhost/replica：0/task：0/cpu：0"](gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape，渐变/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape_1)]]

我知道这两种方法的输出是不同的，但我不知道如何修改注意力代码以合并它。如何将前进和后向状态发送到注意力模块 - 我是否连接了两个隐藏状态？

我从错误消息中发现某处两个张量的批量大小不匹配，一个是 32，另一个是 16。我想这是因为双向 rnn 的输出列表是单向 rnn 的两倍大小。您只是没有相应地在以下代码中进行调整。

如何将向前和向后状态都引起注意模块 - 我是否连接了两个隐藏状态？

您可以引用此代码：

def _reduce_states(self, fw_st, bw_st):
"""Add to the graph a linear layer to reduce the encoder's final FW and BW state into a single initial state for the decoder. This is needed because the encoder is bidirectional but the decoder is not.
Args:
fw_st: LSTMStateTuple with hidden_dim units.
bw_st: LSTMStateTuple with hidden_dim units.
Returns:
state: LSTMStateTuple with hidden_dim units.
"""
hidden_dim = self._hps.hidden_dim
with tf.variable_scope('reduce_final_st'):
# Define weights and biases to reduce the cell and reduce the state
w_reduce_c = tf.get_variable('w_reduce_c', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
w_reduce_h = tf.get_variable('w_reduce_h', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_c = tf.get_variable('bias_reduce_c', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_h = tf.get_variable('bias_reduce_h', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
# Apply linear layer
old_c = tf.concat(axis=1, values=[fw_st.c, bw_st.c]) # Concatenation of fw and bw cell
old_h = tf.concat(axis=1, values=[fw_st.h, bw_st.h]) # Concatenation of fw and bw state
new_c = tf.nn.relu(tf.matmul(old_c, w_reduce_c) + bias_reduce_c) # Get new cell from old cell
new_h = tf.nn.relu(tf.matmul(old_h, w_reduce_h) + bias_reduce_h) # Get new state from old state
return tf.contrib.rnn.LSTMStateTuple(new_c, new_h) # Return new cell and state

相关内容

最新更新

热门标签：