在运行我的深度学习时,在训练过程中的某个时刻,我的验证损失是否会低于训练损失?我附上了我的培训过程代码:
def train_model(model, train_loader,val_loader,lr):
"Model training"
epochs=100
model.train()
train_losses = []
val_losses = []
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-5)
#Reduce learning rate if no improvement is observed after 10 Epochs.
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=2, verbose=True)
for epoch in range(epochs):
for data in train_loader:
y_pred = model.forward(data)
loss1 = criterion(y_pred[:, 0], data.y[0])
loss2 = criterion(y_pred[:,1], data.y[1])
train_loss = 0.8*loss1+0.2*loss2
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
train_losses.append(train_loss.detach().numpy())
with torch.no_grad():
for data in val_loader:
y_val = model.forward(data)
loss1 = criterion(y_val[:,0], data.y[0])
loss2 = criterion(y_val[:,1], data.y[1])
val_loss = 0.8*loss1+0.2*loss2
#scheduler.step(loss)
val_losses.append(val_loss.detach().numpy())
print(f'Epoch: {epoch}, train_loss: {train_losses[epoch]:.3f} , val_loss: {val_losses[epoch]:.3f}')
return train_losses, val_losses
这是一个多任务模型,我分别计算两个损失,然后考虑加权和。
我不确定的是val_loss
的缩进,这可能会在打印时引起一些问题。一般来说,我会说我对验证有一些困惑:
1( 首先,我通过了train_loader
中的所有批次,并调整了训练损失。
2( 然后,我开始迭代我的val_loader
,对单批看不见的数据进行预测,但我在val_losses
列表中附加的是由模型在val_loader
中最后一批上计算的验证损失。我不确定这是否正确。我附上训练期间打印的训练和价值损失:
Epoch: 0, train_loss: 7.315 , val_loss: 7.027
Epoch: 1, train_loss: 7.227 , val_loss: 6.943
Epoch: 2, train_loss: 7.129 , val_loss: 6.847
Epoch: 3, train_loss: 7.021 , val_loss: 6.741
Epoch: 4, train_loss: 6.901 , val_loss: 6.624
Epoch: 5, train_loss: 6.769 , val_loss: 6.493
Epoch: 6, train_loss: 6.620 , val_loss: 6.347
Epoch: 7, train_loss: 6.452 , val_loss: 6.182
Epoch: 8, train_loss: 6.263 , val_loss: 5.996
Epoch: 9, train_loss: 6.051 , val_loss: 5.788
Epoch: 10, train_loss: 5.814 , val_loss: 5.555
Epoch: 11, train_loss: 5.552 , val_loss: 5.298
Epoch: 12, train_loss: 5.270 , val_loss: 5.022
Epoch: 13, train_loss: 4.972 , val_loss: 4.731
Epoch: 14, train_loss: 4.666 , val_loss: 4.431
Epoch: 15, train_loss: 4.357 , val_loss: 4.129
Epoch: 16, train_loss: 4.049 , val_loss: 3.828
Epoch: 17, train_loss: 3.752 , val_loss: 3.539
Epoch: 18, train_loss: 3.474 , val_loss: 3.269
Epoch: 19, train_loss: 3.220 , val_loss: 3.023
Epoch: 20, train_loss: 2.992 , val_loss: 2.803
Epoch: 21, train_loss: 2.793 , val_loss: 2.613
Epoch: 22, train_loss: 2.626 , val_loss: 2.453
Epoch: 23, train_loss: 2.488 , val_loss: 2.323
Epoch: 24, train_loss: 2.378 , val_loss: 2.220
Epoch: 25, train_loss: 2.290 , val_loss: 2.140
Epoch: 26, train_loss: 2.221 , val_loss: 2.078
Epoch: 27, train_loss: 2.166 , val_loss: 2.029
Epoch: 28, train_loss: 2.121 , val_loss: 1.991
Epoch: 29, train_loss: 2.084 , val_loss: 1.959
Epoch: 30, train_loss: 2.051 , val_loss: 1.932
Epoch: 31, train_loss: 2.022 , val_loss: 1.909
Epoch: 32, train_loss: 1.995 , val_loss: 1.887
Epoch: 33, train_loss: 1.970 , val_loss: 1.867
Epoch: 34, train_loss: 1.947 , val_loss: 1.849
Epoch: 35, train_loss: 1.924 , val_loss: 1.831
Epoch: 36, train_loss: 1.902 , val_loss: 1.815
Epoch: 37, train_loss: 1.880 , val_loss: 1.799
Epoch: 38, train_loss: 1.859 , val_loss: 1.783
Epoch: 39, train_loss: 1.839 , val_loss: 1.769
Epoch: 40, train_loss: 1.820 , val_loss: 1.755
Epoch: 41, train_loss: 1.800 , val_loss: 1.742
Epoch: 42, train_loss: 1.781 , val_loss: 1.730
Epoch: 43, train_loss: 1.763 , val_loss: 1.717
Epoch: 44, train_loss: 1.744 , val_loss: 1.705
Epoch: 45, train_loss: 1.726 , val_loss: 1.694
Epoch: 46, train_loss: 1.708 , val_loss: 1.683
...
所以我怀疑我搞砸了压痕。。
验证损失可以低于培训损失。
正如您在第2点中提到的,您只存储/附加最后一批的序列和验证损失。这可能不是你想要的,你可能想在每次迭代时存储训练损失,并在最后查看其平均值。这会让你更好地了解训练进度,因为这将是整个数据和单批的损失