GRU模型未学习



我正在尝试在文本数据上拟合GRU模型,以预测26个标签中的一个。问题是这个模型并没有真正地学习(准确率在4%左右,这只是随机的机会)。既然我知道这个问题是"可学习的",我怀疑我的代码中有错误,但我不知道它是什么。

我的数据由每个标签(标记和单词编码)100K个句子组成(每个句子映射到26个标签中的一个)。我的任务是预测一个新的未知句子的标签。我尝试了几种方法,例如使用批处理大小>但是我现在坚持的方法是将每20个句子加入到一个批次中,所以我的样本变得更大一些,并且每次只适合一个批次的模型。

模型:

class GRU(nn.Module):
def __init__(self, input_size, num_classes, batch_size):
super(GRU, self).__init__()
self.hidden_state = None
self._batch_first = True
self.batch_size = batch_size
self.hidden_size = 256
self.num_layers = 1
embedding_dim = 256
self.embedding = nn.Embedding(input_size, embedding_dim=embedding_dim)
nn.init.uniform_(self.embedding.weight, -1.0, 1.0)
self.gru = nn.GRU(embedding_dim, self.hidden_size, self.num_layers, batch_first=self._batch_first)
self.fc = nn.Linear(self.hidden_size, num_classes)

def init_hidden(self):
self.hidden_state = torch.randn(self.num_layers, self.batch_size, self.hidden_size).to(device)
def forward(self, x):
embeds = self.embedding(x)
out, self.hidden_state = self.gru(embeds, self.hidden_state)
out = out[:, -1, :]
out = self.fc(out)
return out

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
learning_rate = 0.001
optimizer = lambda mdl: torch.optim.Adam(mdl.parameters(), lr=learning_rate)
model = RNN(len(vocab), len(encoded_lbls), BATCH_SIZE).to(device)
# RNN(
#   (embedding): Embedding(19353, 256)
#   (rnn): GRU(256, 256, batch_first=True)
#   (fc): Linear(in_features=256, out_features=26, bias=True)
# )

我尝试了不同的学习率和不同的损失,如LogSoftmax的NLLLoss,但没有什么不同。

因为我认为单词图是这个问题的一个很好的特征,所以我将每个批处理拆分为单词图,并将它们一个一个地输入到模型中,同时在每个批处理之前重置隐藏状态:

model.train(mode=True)
for epoch in range(epochs):
for label,encoded_txt in train_loader:
encoded_txt, label = encoded_txt.to(device), label.to(device)
model.init_hidden()
output, loss, _ = evaluate(model, optim, encoded_txt, label, train=True)
# validation eval...

evaluate()函数:

def evaluate(model, optim, txt, label, train=False):
for ngram in txt.split(NGRAM_LEN):  # NGRAM_LEN = 3
output = model(ngram)
loss = criterion(output, label)    
if train:
optim.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.25)
for p in model.parameters():
p.data.add_(p.grad, alpha=-learning_rate)
optim.step()

accuracy = np.mean(np.array([item.item() for item in torch.argmax(output, dim=1)]) == label.cpu().numpy())
return output, loss.item(), accuracy

这是我在10个epoch后得到的结果:

Epoch 0: Training Loss: 3.3762, Validation Loss: 3.4029, Validation Accuracy: 3.87%
Epoch 1: Training Loss: 3.3084, Validation Loss: 3.5362, Validation Accuracy: 3.89%
Epoch 2: Training Loss: 3.1202, Validation Loss: 3.8107, Validation Accuracy: 4.32%
Epoch 3: Training Loss: 2.9897, Validation Loss: 4.0599, Validation Accuracy: 4.57%
Epoch 4: Training Loss: 2.9118, Validation Loss: 4.3766, Validation Accuracy: 3.93%
Epoch 5: Training Loss: 2.9161, Validation Loss: 4.4962, Validation Accuracy: 4.23%
Epoch 6: Training Loss: 2.9117, Validation Loss: 4.7663, Validation Accuracy: 4.47%
Epoch 7: Training Loss: 2.9203, Validation Loss: 4.9078, Validation Accuracy: 4.55%
Epoch 8: Training Loss: 2.9253, Validation Loss: 5.1911, Validation Accuracy: 4.49%
Epoch 9: Training Loss: 2.9592, Validation Loss: 5.4946, Validation Accuracy: 4.23%

我希望在验证集上至少有60%的准确性,但正如你所看到的,这只是随机的机会。训练损失并没有真正减少,而验证损失却在增加。我不能说这是过拟合,因为训练损失相当高,所以它不是真正的学习。

谁能发现代码中的错误或建议如何调试它?

我认为你不应该调用nn。格鲁乌陷入了这样的循环。我想是的。GRU应该接受一系列令牌。如果你想编写手动循环的代码,你可能需要nn。GRUCell (https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html) .

如果您查看https://pytorch.org/docs/stable/generated/torch.nn.GRU.html底部的示例,您可以一次传递您的序列。(记住要注意批处理维度与GRU构造函数的batch_first参数一致。)

另外,您可能不希望随机初始化隐藏状态。我可能会把它初始化为全0。

相关内容

最新更新