在Huggingface BERT模型的顶部添加致密层

我想在输出原始隐藏状态的裸露BERT模型转换器的顶部添加一个密集层，然后对生成的模型进行微调。具体来说，我使用的是这个基本模型。这就是模型应该做的：

对句子进行编码(一个向量，每个句子的符号有768个元素(
只保留第一个向量(与第一个令牌相关(
在该矢量的顶部添加一个密集层，以获得所需的变换

到目前为止，我已经成功地对句子进行了编码：

from sklearn.neural_network import MLPRegressor
import torch
from transformers import AutoModel, AutoTokenizer
# List of strings
sentences = [...]
# List of numbers
labels = [...]
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = AutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
# 2D array, one line per sentence containing the embedding of the first token
encoded_sentences = torch.stack([model(**tokenizer(s, return_tensors='pt'))[0][0][0]
for s in sentences]).detach().numpy()
regr = MLPRegressor()
regr.fit(encoded_sentences, labels)

通过这种方式，我可以通过给神经网络输入编码的句子来训练它。然而，这种方法显然不能对基本BERT模型进行微调。有人能帮我吗？我如何构建一个可以完全微调的模型(可能是在pytorch中或使用Huggingface库(？

有两种方法：由于您希望为类似于分类的下游任务微调模型，因此可以直接使用：

BertForSequenceClassification类。对768的输出维度执行逻辑回归层的微调。

或者，您可以定义一个自定义模块，该模块基于预先训练的权重创建伯特模型，并在其上添加层

from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = nn.Linear(768, 256)
self.linear2 = nn.Linear(256, 3) ## 3 is the number of classes in this example
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids, 
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear1_output)
return linear2_output
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
model = CustomBERTModel() # You can pass the parameters if required to have more flexible model
model.to(torch.device("cpu")) ## can be gpu
criterion = nn.CrossEntropyLoss() ## If required define your own criterion
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()))
for epoch in epochs:
for batch in data_loader: ## If you have a DataLoader()  object to get the data.
data = batch[0]
targets = batch[1] ## assuming that data loader returns a tuple of data and its targets

optimizer.zero_grad()   
encoding = tokenizer.batch_encode_plus(data, return_tensors='pt', padding=True, truncation=True,max_length=50, add_special_tokens = True)
outputs = model(input_ids, attention_mask=attention_mask)
outputs = F.log_softmax(outputs, dim=1)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

对于任何使用Tensorflow/Keras的人来说，相当于Ashwin的答案：

from tensorflow import keras
from transformers import AutoTokenizer, TFAutoModel

class CustomBERTModel(keras.Model):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = TFAutoModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
### New layers:
self.linear1 = keras.layers.Dense(256)
self.linear2 = keras.layers.Dense(3) ## 3 is the number of classes in this example
def call(self, inputs, training=False):
# call expects only one positional argument, so you have to pass in a tuple and unpack. The next parameter is a special reserved training parameter.
ids, mask = inputs
sequence_output = self.bert(ids, mask, training=training).last_hidden_state
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:]) ## extract the 1st token's embeddings
linear2_output = self.linear2(linear1_output)
return linear2_output

model = CustomBERTModel()
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
ipts = tokenizer("Some input sequence", return_tensors="tf")
test = model((ipts["input_ids"], ipts["attention_mask"]))

然后，为了训练模型，您可以使用GradientTape进行自定义训练循环。

可以验证附加图层是否也可以使用model.trainable_weights进行训练。您可以访问各个层的权重，例如model.trainable_weights[-1].numpy()将获得最后一层的偏移向量。[注意，只有在第一次执行调用方法后，密集层才会出现。]

如果要调整BERT模型本身，则需要修改模型的参数。要做到这一点，你很可能想用PyTorch来完成你的工作。下面是一些粗略的psuedo代码来说明：

from torch.optim import SGD
model = ... # whatever model you are using
parameters = model.parameters() # or some more specific set of parameters
optimizer = SGD(parameters,lr=.01) # or whatever optimizer you want
optimizer.zero_grad() # boiler-platy pytorch function
input = ... # whatever the appropriate input for your task is
label = ... # whatever the appropriate label for your task is
loss = model(**input, label) # usuall loss is the first item returned
loss.backward() # calculates gradient
optim.step() # runs optimization algorithm

我省略了所有相关的细节，因为它们非常乏味，而且特定于你的特定任务。Huggingface有一篇很好的文章介绍了这一点。这里有更多的细节，当你使用任何pytorch的东西时，你肯定会想参考一些pytorch文档。在尝试做任何严肃的事情之前，我强烈推荐pytorch闪电战。

相关内容

最新更新

热门标签：