微调预训练的伯特时出现OOM错误



我正在微调BertForPreTraining模型,它上面有mlmnsp头。当我运行代码时,它在开始将第一批分配给gpu的那一刻就出现内存不足错误。我甚至试过小批量的,但都不起作用。我的gpu有2GbRam,我甚至在带有4GbRam的gpu上尝试了我的代码,但仍然收到同样的错误。我用pytorch和cuda 11.3编码。这是我的代码:

current_loc = pathlib.Path(__file__).parent.absolute()
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = BertForPreTraining.from_pretrained('bert-base-uncased')
with open("%sclean.txt" % (current_loc), 'r') as fp:
text = fp.read().split('n')

bag = [item for sentence in text for item in sentence.split('.') if item != '']
bag_size = len(bag)
sentence_a = []
sentence_b = []
label = []
for paragraph in text:
sentences = [
sentence for sentence in paragraph.split('.') if sentence != ''
]
num_sentences = len(sentences)
if num_sentences > 1:
start = random.randint(0, num_sentences-2)
# 50/50 whether is IsNextSentence or NotNextSentence
if random.random() >= 0.5:
# this is IsNextSentence
sentence_a.append(sentences[start])
sentence_b.append(sentences[start+1])
label.append(0)
else:
index = random.randint(0, bag_size-1)
# this is NotNextSentence
sentence_a.append(sentences[start])
sentence_b.append(bag[index])
label.append(1)

inputs = tokenizer(sentence_a, sentence_b, return_tensors='pt',
max_length=512, truncation=True, padding='max_length')
inputs['next_sentence_label'] = torch.LongTensor([label]).T
inputs['labels'] = inputs.input_ids.detach().clone()
# create random array of floats with equal dimensions to input_ids tensor
rand = torch.rand(inputs.input_ids.shape)
# create mask array
mask_arr = (rand < 0.15) * (inputs.input_ids != 101) * 
(inputs.input_ids != 102) * (inputs.input_ids != 0)

selection = []
for i in range(inputs.input_ids.shape[0]):
selection.append(
torch.flatten(mask_arr[i].nonzero()).tolist()
)
for i in range(inputs.input_ids.shape[0]):
inputs.input_ids[i, selection[i]] = 103

class OurDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings = encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return len(self.encodings.input_ids)
dataset = OurDataset(inputs)
loader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# and move our model over to the selected device
model.to(device)
# activate training mode
model.train()
# initialize optimizer
optim = AdamW(model.parameters(), lr=1e-4)
# torch.cuda.empty_cache()
# torch.cuda.memory_summary(device=None, abbreviated=False)
epochs = 2
for epoch in range(epochs):
# setup loop with TQDM and dataloader
loop = tqdm(loader, leave=True)
for batch in loop:
# initialize calculated gradients (from prev step)
optim.zero_grad()
# pull all tensor batches required for training
input_ids = batch['input_ids'].to(device)
token_type_ids = batch['token_type_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
next_sentence_label = batch['next_sentence_label'].to(device)
labels = batch['labels'].to(device)
# process
outputs = model(input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids,
next_sentence_label=next_sentence_label,
labels=labels)
# extract loss
loss = outputs.loss
# calculate loss for every parameter that needs grad update
loss.backward()
# update parameters
optim.step()
# print relevant info to progress bar
loop.set_description(f'Epoch {epoch}')
loop.set_postfix(loss=loss.item())

我得到了以下错误:

BertForPreTraining的某些权重未从模型中初始化bert基上的检查点未封顶,并且是新初始化的:你可能应该训练这个模型在下游任务中,能够将其用于预测和推论0%|
|0/159[00:00<?,?it/s]测试.py:75:UserWarning:要复制构造从张量中,建议使用sourceTensor.clone((.destrict((或sourceTensor.clone((.destrict((.request_grad_(True(,而不是的torch.tensor(sourceTensor(.returnkey,val in self.encodings.items((}0%|
|0/159[00:22<?,?it/s]追溯(上次调用(:文件"test.py";,第109行,inoutputs=model(input_ids,attention_mask=attention\ymask,File"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,1093行,向前outputs=self.bert(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,995行,向前encoder_outputs=self.encoder(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,582行,向前layer_outputs=layer_module(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,470线,向前self-attention_outputs=self.attention(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,401线,向前self-outputs=self.self(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py";,第1102行,在_call_impl中return forward_call(*input,**kwargs(File"C: \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py";,线路305,向前attention\uscores=torc.matmul(query_layer,key_layer.transpose(-1,-2((运行时错误:CUDA内存不足。尝试分配20.00 MiB(GPU 0;2.00 GiB总容量;1.17 GiB已分配;0字节可用;1.23 GiB由PyTorch(如果保留的内存>gt;分配内存尝试设置max_split_size_mb以避免碎片。请参阅相关文档内存管理和PYTORCH_CUDA_ALLOC_COFF

Bert是一个庞大而复杂的NN,所以如果它给您带来OOM错误,我不会感到惊讶。如果您一直在尝试使用最小的批处理大小,但仍然不起作用,那么我建议您启用Bert的gradient_checkpointing配置。这个功能基本上是一个在内存使用和GPU使用之间进行权衡的好主意。你可以在这里阅读更多。事实上,我的12GB GPU也有同样的问题,但我设法用gradient_checkpointing模块处理了它。

根据文档,你可以通过两种方式为每个变形金刚模型打开梯度检查点:

  1. 调用gradient_checkpointing_enable()函数:

    model.gradient_checkpointing_enable((

  2. 只需将--gradient_checkpointing标志添加到Trainer参数中。

如果这不起作用,那么您可能需要从管道中删除nsp任务,以使模型更简单。如果这也不起作用,那么最好的选择可能是升级GPU硬件。

最新更新