文本嵌入后更正keras LSTM输入形状

我试图在时间步长方面更好地理解keras LSTM层，但仍有点困难。

我想创建一个能够比较2个输入(连体网络(的模型。所以我的输入是两次预处理的文本。预处理如下：

max_len = 64
data['cleaned_text_1'] = assets.apply(lambda x: clean_string(data[]), axis=1)
data['text_1_seq'] = t.texts_to_sequences(cleaned_text_1.astype(str).values)
data['text_1_seq_pad'] = [list(x) for x in pad_sequences(assets['text_1_seq'], maxlen=max_len, padding='post')]

对第二文本输入也进行同样的操作。T来自CCD_ 1。

我用定义了模型

common_embed = Embedding(
name="synopsis_embedd",
input_dim=len(t.word_index)+1,
output_dim=300,
input_length=len(data['text_1_seq_pad'].tolist()[0]),
trainable=True
)
lstm_layer = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, dropout=0.2, recurrent_dropout=0.2)
)
input1 = tf.keras.Input(shape=(len(data['text_1_seq_pad'].tolist()[0]),))
e1 = common_embed(input1)
x1 = lstm_layer(e1)
input2 = tf.keras.Input(shape=(len(data['text_1_seq_pad'].tolist()[0]),))
e2 = common_embed(input2)
x2 = lstm_layer(e2)
merged = tf.keras.layers.Lambda(
function=l1_distance, output_shape=l1_dist_output_shape, name='L1_distance'
)([x1, x2])
conc = Concatenate(axis=-1)([merged, x1, x2])
x = Dropout(0.01)(conc)
preds = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[input1, input2], outputs=preds)

如果我用适合的方法输入numpy数据，这似乎有效：

model.fit(
x = [np.array(data['text_1_seq_pad'].tolist()), np.array(data['text_2_seq_pad'].tolist())],
y = y_train.values.reshape(-1,1), 
epochs=epochs,
batch_size=batch_size,
validation_data=([np.array(val['text_1_seq_pad'].tolist()), np.array(val['text_2_seq_pad'].tolist())], y_val.values.reshape(-1,1)),
)

我现在试图理解的是，在我的情况下，LSTM层的形状是什么：

样本
time_step
功能

因为我将嵌入输出dim设置为300，并且每个LSTM只有1个输入特征，所以LSTM层的input_shape将是input_shape=(300,1)，这正确吗？

我需要重塑嵌入输出吗？或者我可以只设置吗

lstm_layer = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, input_shape=(300,1), dropout=0.2, recurrent_dropout=0.2)
)

从嵌入输出？

示例笔记本可以在Github或Colab 中找到

通常，LSTM层需要以这种方式形成的3D输入：(batch_size, lenght of an input sequence , number of features )。(批量大小实际上并不重要，所以您可以考虑一个输入需要具有这种形状(lenght of sequence, number of features par item)(

在您的情况下，嵌入层的输出昏暗度为300。所以你的LSTM有300个功能。

然后，在句子上使用LSTM需要恒定数量的标记。LSTM使用常量输入维度，您不能向它传递一个带有12个标记的文本，然后再传递一个带68个标记的文字。事实上，如果需要的话，你需要固定一个限制并填充序列。因此，如果你的句子长20个标记，并且你的限制是50，你需要用30个"中性"标记(通常是零(填充(在序列的末尾添加(序列。

毕竟，您的LSTM输入维度必须是(number of token per text, dimension of your embedding outputs)->在我的例子中是(50, 300)。

要了解更多信息，建议您查看以下内容：(但在您的情况下，您可以将time_steps替换为number_of_tokens(

https://shiva-verma.medium.com/understanding-input-and-output-shape-in-lstm-keras-c501ee95c65e

分享编辑删去标志

相关内容

最新更新

热门标签：