微调预训练的伯特时出现OOM错误

我正在微调BertForPreTraining模型，它上面有mlm和nsp头。当我运行代码时，它在开始将第一批分配给gpu的那一刻就出现内存不足错误。我甚至试过小批量的，但都不起作用。我的gpu有2GbRam，我甚至在带有4GbRam的gpu上尝试了我的代码，但仍然收到同样的错误。我用pytorch和cuda 11.3编码。这是我的代码：

current_loc = pathlib.Path(__file__).parent.absolute()
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = BertForPreTraining.from_pretrained('bert-base-uncased')
with open("%sclean.txt" % (current_loc), 'r') as fp:
text = fp.read().split('n')

bag = [item for sentence in text for item in sentence.split('.') if item != '']
bag_size = len(bag)
sentence_a = []
sentence_b = []
label = []
for paragraph in text:
sentences = [
sentence for sentence in paragraph.split('.') if sentence != ''
]
num_sentences = len(sentences)
if num_sentences > 1:
start = random.randint(0, num_sentences-2)
# 50/50 whether is IsNextSentence or NotNextSentence
if random.random() >= 0.5:
# this is IsNextSentence
sentence_a.append(sentences[start])
sentence_b.append(sentences[start+1])
label.append(0)
else:
index = random.randint(0, bag_size-1)
# this is NotNextSentence
sentence_a.append(sentences[start])
sentence_b.append(bag[index])
label.append(1)

inputs = tokenizer(sentence_a, sentence_b, return_tensors='pt',
max_length=512, truncation=True, padding='max_length')
inputs['next_sentence_label'] = torch.LongTensor([label]).T
inputs['labels'] = inputs.input_ids.detach().clone()
# create random array of floats with equal dimensions to input_ids tensor
rand = torch.rand(inputs.input_ids.shape)
# create mask array
mask_arr = (rand < 0.15) * (inputs.input_ids != 101) * 
(inputs.input_ids != 102) * (inputs.input_ids != 0)

selection = []
for i in range(inputs.input_ids.shape[0]):
selection.append(
torch.flatten(mask_arr[i].nonzero()).tolist()
)
for i in range(inputs.input_ids.shape[0]):
inputs.input_ids[i, selection[i]] = 103

class OurDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings = encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return len(self.encodings.input_ids)
dataset = OurDataset(inputs)
loader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# and move our model over to the selected device
model.to(device)
# activate training mode
model.train()
# initialize optimizer
optim = AdamW(model.parameters(), lr=1e-4)
# torch.cuda.empty_cache()
# torch.cuda.memory_summary(device=None, abbreviated=False)
epochs = 2
for epoch in range(epochs):
# setup loop with TQDM and dataloader
loop = tqdm(loader, leave=True)
for batch in loop:
# initialize calculated gradients (from prev step)
optim.zero_grad()
# pull all tensor batches required for training
input_ids = batch['input_ids'].to(device)
token_type_ids = batch['token_type_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
next_sentence_label = batch['next_sentence_label'].to(device)
labels = batch['labels'].to(device)
# process
outputs = model(input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids,
next_sentence_label=next_sentence_label,
labels=labels)
# extract loss
loss = outputs.loss
# calculate loss for every parameter that needs grad update
loss.backward()
# update parameters
optim.step()
# print relevant info to progress bar
loop.set_description(f'Epoch {epoch}')
loop.set_postfix(loss=loss.item())

我得到了以下错误：

BertForPreTraining的某些权重未从模型中初始化bert基上的检查点未封顶，并且是新初始化的：你可能应该训练这个模型在下游任务中，能够将其用于预测和推论0%|
|0/159[00:00<？，？it/s]测试.py:75:UserWarning:要复制构造从张量中，建议使用sourceTensor.clone((.destrict((或sourceTensor.clone((.destrict((.request_grad_(True(，而不是的torch.tensor(sourceTensor(.returnkey，val in self.encodings.items((｝0%|
|0/159[00:22<？，？it/s]追溯(上次调用(：文件"test.py"；，第109行，inoutputs=model(input_ids，attention_mask=attention\ymask，File"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，1093行，向前outputs＝self.bert(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，995行，向前encoder_outputs＝self.encoder(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，582行，向前layer_outputs＝layer_module(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，470线，向前self-attention_outputs=self.attention(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，401线，向前self-outputs＝self.self(文件"C:\Users\Rastin\anaconda3\lib\site-packages\torch\nn\modules\module.py"；，第1102行，在_call_impl中return forward_call(*input，**kwargs(File"C： \Users\Rastin\anaconda3\lib\site-packages\transforms\models\bert\modeling_bert.py"；，线路305，向前attention\uscores=torc.matmul(query_layer，key_layer.transpose(-1，-2((运行时错误：CUDA内存不足。尝试分配20.00 MiB(GPU 0；2.00 GiB总容量；1.17 GiB已分配；0字节可用；1.23 GiB由PyTorch(如果保留的内存>gt；分配内存尝试设置max_split_size_mb以避免碎片。请参阅相关文档内存管理和PYTORCH_CUDA_ALLOC_COFF

Bert是一个庞大而复杂的NN，所以如果它给您带来OOM错误，我不会感到惊讶。如果您一直在尝试使用最小的批处理大小，但仍然不起作用，那么我建议您启用Bert的gradient_checkpointing配置。这个功能基本上是一个在内存使用和GPU使用之间进行权衡的好主意。你可以在这里阅读更多。事实上，我的12GB GPU也有同样的问题，但我设法用gradient_checkpointing模块处理了它。

根据文档，你可以通过两种方式为每个变形金刚模型打开梯度检查点：

调用gradient_checkpointing_enable()函数：

model.gradient_checkpointing_enable((
只需将--gradient_checkpointing标志添加到Trainer参数中。

如果这不起作用，那么您可能需要从管道中删除nsp任务，以使模型更简单。如果这也不起作用，那么最好的选择可能是升级GPU硬件。

相关内容

最新更新

热门标签：