我使用Keras构建了一个RNN。RNN用于解决一个回归问题:
def RNN_keras(feat_num, timestep_num=100):
model = Sequential()
model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
model.compile(loss='mean_squared_error',
optimizer=rmsprop,
metrics=['mean_squared_error'])
return model
整个过程看起来很好。但是随着时间的推移,损失是完全相同的。
61267 in the training set
6808 in the test set
Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.
Build model...
# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.
****** Iterating over each batch of the training data ******
Epoch 1/3 : Batch 1/48 | loss = 11011073.000000 | root_mean_squared_error = 3318.232910
Epoch 1/3 : Batch 2/48 | loss = 620.271667 | root_mean_squared_error = 24.904161
Epoch 1/3 : Batch 3/48 | loss = 620.068665 | root_mean_squared_error = 24.900017
......
Epoch 1/3 : Batch 47/48 | loss = 618.046448 | root_mean_squared_error = 24.859678
Epoch 1/3 : Batch 48/48 | loss = 652.977051 | root_mean_squared_error = 25.552946
****** Epoch 1: RMSD(training) = 24.897174
Epoch 2/3 : Batch 1/48 | loss = 607.372620 | root_mean_squared_error = 24.644049
Epoch 2/3 : Batch 2/48 | loss = 599.667786 | root_mean_squared_error = 24.487448
Epoch 2/3 : Batch 3/48 | loss = 621.368103 | root_mean_squared_error = 24.926300
......
Epoch 2/3 : Batch 47/48 | loss = 620.133667 | root_mean_squared_error = 24.901398
Epoch 2/3 : Batch 48/48 | loss = 639.971924 | root_mean_squared_error = 25.297264
****** Epoch 2: RMSD(training) = 24.897174
Epoch 3/3 : Batch 1/48 | loss = 651.519836 | root_mean_squared_error = 25.523636
Epoch 3/3 : Batch 2/48 | loss = 673.582581 | root_mean_squared_error = 25.952084
Epoch 3/3 : Batch 3/48 | loss = 613.930054 | root_mean_squared_error = 24.776562
......
Epoch 3/3 : Batch 47/48 | loss = 624.460327 | root_mean_squared_error = 24.988203
Epoch 3/3 : Batch 48/48 | loss = 629.544250 | root_mean_squared_error = 25.090448
****** Epoch 3: RMSD(training) = 24.897174
我不认为这是正常的。我错过什么了吗?
更新:我发现所有的预测在所有的时代之后都是零。这就是为什么所有rmsd都是相同的原因,因为预测都是相同的,即0。我检查了训练y,它只包含几个0。所以这不是由于数据不平衡。
所以现在我在想,如果这是因为我使用的图层和激活
你的RNN函数似乎没问题。
减少损失的速度取决于优化器和学习率。
无论你如何使用衰减率0.9。尝试更大的学习率,无论如何它会以0.9的速率下降。
尝试其他具有不同学习率的优化器与keras一起提供的其他优化器:https://keras.io/optimizers/
很多时候,一些优化器在某些数据集上工作得很好,而另一些则可能失败。
您是否尝试过将激活功能从relu更改为softmax?
Relu活化有发散的趋势。然而,如果用特征矩阵初始化权值可能会得到更好的收敛性。
由于您使用rnn来解决回归问题(而不是分类),因此您应该在最后一层使用"线性"激活。
在你的代码中,
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
将'relu'
改为activation='linear'
如果不奏效,则去除第二层的activation='relu'
。
此外,rmsprop的学习率通常在0.1到0.0001之间。