如何在预训练的BERT模型上添加多类多标签层

我正在尝试使用来自huggingface transformers库的预训练BERT模型来执行多任务多类句子分类任务。我曾尝试从那里使用BERTForSequenceClassification模型，但我遇到的问题是，我无法将其扩展到多个任务。我将通过这个例子使它更有信息性。

假设我们有四个不同的任务，对于每个句子和每个任务，我们都有如下标签：

A:[‘A’，‘b’，‘c’，‘d’]
B:['e'，'f'，'g'，'h']
C:["i"，"j"，"k"，"l"]
D:['m'，'n'，'o'，'p']

现在，如果我有一个关于这个模型的句子，我希望输出能给我所有四个不同任务(a、B、C、D(的输出。

这是我之前做的

model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 4, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.   
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
)

然后我尝试实现一个CustomBERT模型，如下所示：

class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModelForSequenceClassification.from_pretrained("bert-base-uncased")
### New layers:
self.linear1 = nn.Linear(768, 256)
self.linear2 = nn.Linear(256, num_classes) ## num_classes is the number of classes in this example
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids, 
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) 
linear2_output = self.linear2(linear2_output)
return linear2_output

我已经仔细阅读了与之前类似的问题的答案，但似乎没有一个回答我的问题。我已经尝试过了所有我认为有助于理解我的问题的要点，如果我在解释问题时有任何失误，我会尝试进一步澄清。任何与此相关的答案都将非常有帮助。

您应该使用BertModel而不是BertModelForSequenceClassification，因为BertModelForSequenceClassification在BERT模型的顶部添加了一个用于分类的线性层，并使用CrossEntropyLoss，这意味着多类分类。

因此，首先使用BertModel而不是BertModelForSequenceClassification:

class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-uncased")
### New layers:
self.linear1 = nn.Linear(768, 256)
self.linear2 = nn.Linear(256, 4) ## as you have 4 classes in the output
self.sig = nn.functional.sigmoid()
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids, 
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
linear1_output = self.linear1(sequence_output[:,0,:].view(-1,768)) 
linear2_output = self.linear2(linear2_output)
linear2_output = self.sig(linear2_output)
return linear2_output

接下来，多标签分类使用"Sigmoid"激活而不是"Softmax"(这里，Sigmoid层添加在上面的代码中(

此外，对于多标签分类，需要使用BCELoss而不是CrossEntropyLoss。

相关内容

最新更新

热门标签：