我的Python代码适用于基本转换器模型，但当我尝试使用"大"模型或roberta模型时，我会收到错误消息。我在下面打印的最常见的消息。

Epoch 1 / 40

RuntimeError Traceback(最后一次调用(在((1213#列车型号--->14 train_loss，_=fine_tune((15#我们不关心模型输出的第二项(total_preds(16#我们不想要这里的平均损失值"avg_loss">

5帧/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py(线性((输入、权重、偏差(1688如果input.dim((==2并且bias不是None：1689#保险丝操作稍微快一点->1690 ret=torc.addmm(偏置、输入、重量.t(((1691其他：1692输出=input.matmul(weight.t(((

运行时错误：mat1 dim 1必须与mat2 dim 0 匹配

I am  guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...

def fine_tune((：

model.train((

total_loss，total_accuracy=0，0

用于保存模型预测的空列表

total_preds=[]

迭代批次

对于步骤，在enumerate(train_datalader(中批处理：

# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients 
model.zero_grad()        
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size
# append the model predictions
total_preds.append(preds)

计算epoch的训练损失

avg_oss=总通行证/len(列车_编目器(

以(样本数量，类数量(的形式重塑预测

total_preds=np.连接(total_prads，轴=0(

return avg_loss，total_preds

问候，Mark

我写了一个print语句来显示来自预训练模型的输入的大小。这揭示了真正的大小，即1024，而不是我修改的程序中默认的硬代码值768。一旦我理解了这个问题，就很容易解决。对我来说，这个故事的寓意是，当一个YouTuber(实际上是一个好的！(说"；所有变压器的输出尺寸为768〃；不要把这当成福音！

无法使用在"大型"模型上使用基本变压器的现有代码

用于保存模型预测的空列表

迭代批次

计算epoch的训练损失

以(样本数量，类数量(的形式重塑预测

相关内容

最新更新

热门标签：