我的Python代码适用于基本转换器模型,但当我尝试使用"大"模型或roberta模型时,我会收到错误消息。我在下面打印的最常见的消息。
Epoch 1 / 40
RuntimeError Traceback(最后一次调用(在((1213#列车型号--->14 train_loss,_=fine_tune((15#我们不关心模型输出的第二项(total_preds(16#我们不想要这里的平均损失值"avg_loss">
5帧/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py(线性((输入、权重、偏差(1688如果input.dim((==2并且bias不是None:1689#保险丝操作稍微快一点->1690 ret=torc.addmm(偏置、输入、重量.t(((1691其他:1692输出=input.matmul(weight.t(((
运行时错误:mat1 dim 1必须与mat2 dim 0 匹配
I am guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...
def fine_tune((:
model.train((
total_loss,total_accuracy=0,0
用于保存模型预测的空列表
total_preds=[]
迭代批次
对于步骤,在enumerate(train_datalader(中批处理:
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size
# append the model predictions
total_preds.append(preds)
计算epoch的训练损失
avg_oss=总通行证/len(列车_编目器(
以(样本数量,类数量(的形式重塑预测
total_preds=np.连接(total_prads,轴=0(
return avg_loss,total_preds
问候,Mark
我写了一个print语句来显示来自预训练模型的输入的大小。这揭示了真正的大小,即1024,而不是我修改的程序中默认的硬代码值768。一旦我理解了这个问题,就很容易解决。对我来说,这个故事的寓意是,当一个YouTuber(实际上是一个好的!(说";所有变压器的输出尺寸为768〃;不要把这当成福音!