关于PyTorch中验证过程的一个问题:val_loss低于train_loss



在运行我的深度学习时,在训练过程中的某个时刻,我的验证损失是否会低于训练损失?我附上了我的培训过程代码:

def train_model(model, train_loader,val_loader,lr):

"Model training"
epochs=100
model.train()
train_losses = []

val_losses = []
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-5)

#Reduce learning rate if no improvement is observed after 10 Epochs.

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=2, verbose=True)
for epoch in range(epochs):
for data in train_loader:
y_pred = model.forward(data)
loss1 = criterion(y_pred[:, 0], data.y[0])

loss2 = criterion(y_pred[:,1], data.y[1])

train_loss = 0.8*loss1+0.2*loss2
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
train_losses.append(train_loss.detach().numpy())

with torch.no_grad():

for data in val_loader:

y_val = model.forward(data)

loss1 = criterion(y_val[:,0], data.y[0])

loss2 = criterion(y_val[:,1], data.y[1])

val_loss = 0.8*loss1+0.2*loss2

#scheduler.step(loss)

val_losses.append(val_loss.detach().numpy())

print(f'Epoch: {epoch}, train_loss: {train_losses[epoch]:.3f} , val_loss: {val_losses[epoch]:.3f}')

return train_losses, val_losses

这是一个多任务模型,我分别计算两个损失,然后考虑加权和。

我不确定的是val_loss的缩进,这可能会在打印时引起一些问题。一般来说,我会说我对验证有一些困惑:

1( 首先,我通过了train_loader中的所有批次,并调整了训练损失。

2( 然后,我开始迭代我的val_loader,对单批看不见的数据进行预测,但我在val_losses列表中附加的是由模型在val_loader中最后一批上计算的验证损失。我不确定这是否正确。我附上训练期间打印的训练和价值损失:

Epoch: 0, train_loss: 7.315 , val_loss: 7.027
Epoch: 1, train_loss: 7.227 , val_loss: 6.943
Epoch: 2, train_loss: 7.129 , val_loss: 6.847
Epoch: 3, train_loss: 7.021 , val_loss: 6.741
Epoch: 4, train_loss: 6.901 , val_loss: 6.624
Epoch: 5, train_loss: 6.769 , val_loss: 6.493
Epoch: 6, train_loss: 6.620 , val_loss: 6.347
Epoch: 7, train_loss: 6.452 , val_loss: 6.182
Epoch: 8, train_loss: 6.263 , val_loss: 5.996
Epoch: 9, train_loss: 6.051 , val_loss: 5.788
Epoch: 10, train_loss: 5.814 , val_loss: 5.555
Epoch: 11, train_loss: 5.552 , val_loss: 5.298
Epoch: 12, train_loss: 5.270 , val_loss: 5.022
Epoch: 13, train_loss: 4.972 , val_loss: 4.731
Epoch: 14, train_loss: 4.666 , val_loss: 4.431
Epoch: 15, train_loss: 4.357 , val_loss: 4.129
Epoch: 16, train_loss: 4.049 , val_loss: 3.828
Epoch: 17, train_loss: 3.752 , val_loss: 3.539
Epoch: 18, train_loss: 3.474 , val_loss: 3.269
Epoch: 19, train_loss: 3.220 , val_loss: 3.023
Epoch: 20, train_loss: 2.992 , val_loss: 2.803
Epoch: 21, train_loss: 2.793 , val_loss: 2.613
Epoch: 22, train_loss: 2.626 , val_loss: 2.453
Epoch: 23, train_loss: 2.488 , val_loss: 2.323
Epoch: 24, train_loss: 2.378 , val_loss: 2.220
Epoch: 25, train_loss: 2.290 , val_loss: 2.140
Epoch: 26, train_loss: 2.221 , val_loss: 2.078
Epoch: 27, train_loss: 2.166 , val_loss: 2.029
Epoch: 28, train_loss: 2.121 , val_loss: 1.991
Epoch: 29, train_loss: 2.084 , val_loss: 1.959
Epoch: 30, train_loss: 2.051 , val_loss: 1.932
Epoch: 31, train_loss: 2.022 , val_loss: 1.909
Epoch: 32, train_loss: 1.995 , val_loss: 1.887
Epoch: 33, train_loss: 1.970 , val_loss: 1.867
Epoch: 34, train_loss: 1.947 , val_loss: 1.849
Epoch: 35, train_loss: 1.924 , val_loss: 1.831
Epoch: 36, train_loss: 1.902 , val_loss: 1.815
Epoch: 37, train_loss: 1.880 , val_loss: 1.799
Epoch: 38, train_loss: 1.859 , val_loss: 1.783
Epoch: 39, train_loss: 1.839 , val_loss: 1.769
Epoch: 40, train_loss: 1.820 , val_loss: 1.755
Epoch: 41, train_loss: 1.800 , val_loss: 1.742
Epoch: 42, train_loss: 1.781 , val_loss: 1.730
Epoch: 43, train_loss: 1.763 , val_loss: 1.717
Epoch: 44, train_loss: 1.744 , val_loss: 1.705
Epoch: 45, train_loss: 1.726 , val_loss: 1.694
Epoch: 46, train_loss: 1.708 , val_loss: 1.683
...

所以我怀疑我搞砸了压痕。。

验证损失可以低于培训损失。

正如您在第2点中提到的,您只存储/附加最后一批的序列和验证损失。这可能不是你想要的,你可能想在每次迭代时存储训练损失,并在最后查看其平均值。这会让你更好地了解训练进度,因为这将是整个数据和单批的损失

最新更新