Pytorch暹罗NN与BERT用于句子匹配

我正试图使用pytorch构建一个暹罗神经网络，在其中我输入BERT单词嵌入，并试图找出两个句子是否相似(想象重复的帖子匹配、产品匹配等(。这是型号：

class SiameseNetwork(torch.nn.Module):
def __init__(self):
super(SiameseNetwork, self).__init__()
self.brothers = torch.nn.Sequential(
torch.nn.Linear(512 * 768, 512),
torch.nn.BatchNorm1d(512),
torch.nn.ReLU(inplace=True),
torch.nn.Linear(512, 256),
torch.nn.BatchNorm1d(256),
torch.nn.ReLU(inplace=True),
torch.nn.Linear(256, 32),
)

self.final = torch.nn.Sequential(
torch.nn.Linear(32, 16),
torch.nn.ReLU(inplace=True),
torch.nn.Linear(16, 2),
)

def forward(self, left, right):
outputLeft = self.brothers(left)
outputRight = self.brothers(right)
output = self.final((outputLeft - outputRight) ** 2)
return output
bros = SiameseNetwork()
bros = bros.to(device)

标准和优化器：

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=bros.parameters(), lr=0.001)

训练循环：

for batch in tqdm(tLoader, desc=f"Train epoch: {epoch+1}"):
a = batch[0].to(device)
b = batch[1].to(device)
y = torch.unsqueeze(batch[2].type(torch.FloatTensor), 1).to(device)

optimizer.zero_grad()

output = bros(a,b)
loss = criterion(output, y)
loss.backward()

trainingLoss += loss.item()
optimizer.step()

现在，这似乎是有效的，因为它产生了合理的结果，但验证误差在几个时期后就停止下降到0.13。在这种使用Pytorch的NN上找不到很多东西。有没有优化它的方法？我做错什么了吗？

您的第一层参数严重过多，容易过拟合(总共有2.01亿个参数(。我假设形状512 * 768反映了令牌的数量乘以它们的维度；如果是这样的话，您需要重新思考您的体系结构。您需要某种权重共享或池策略来将num_words * dim输入减少到固定表示(这正是递归网络取代句子编码的完全连接变体的原因(。特别是在基于转换器的体系结构中，[CLS]令牌(令牌编号0，在输入前加前缀(通常用作"；摘要"；序列级和双序列级任务的令牌。

相关内容

最新更新

热门标签：