下面是我使用Keras构建的RNN:
def RNN_keras(feat_num, timestep_num=100):
model = Sequential()
model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(output_dim=1, activation='linear'))) # sequence labeling
rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
model.compile(loss='mean_squared_error',
optimizer=rmsprop,
metrics=['mean_squared_error'])
return model
输出如下:
61267 in the training set
6808 in the test set
Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.
Build model...
****** Iterating over each batch of the training data ******
# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.
Epoch 1/3 : Batch 1/48 | loss = 607.043823 | root_mean_squared_error = 24.638334
Epoch 1/3 : Batch 2/48 | loss = 14479824582732.208323 | root_mean_squared_error = 3805236.468701
Epoch 1/3 : Batch 3/48 | loss = nan | root_mean_squared_error = nan
Epoch 1/3 : Batch 4/48 | loss = nan | root_mean_squared_error = nan
Epoch 1/3 : Batch 5/48 | loss = nan | root_mean_squared_error = nan
......
损耗在第二批中非常高,然后变为nan。真实的结果y不包含非常大的值。最大值小于400
另一方面,我检查预测输出y_hat。RNN返回一些非常高的预测,这导致无限。
然而,我仍然困惑于如何改进我的模型。
通过1)将输出层的激活从"线性"改变为"relu"和/或2)降低学习率,"大概"解决了这个问题。
然而,现在的预测都是零。