Pytorch-如何将自我关注添加到另一个架构中



我是pytorch框架的初学者,我正试图在另一个架构(BERT(之上添加一个多头自我关注(这是一个简单的问题,但我不熟悉pytorch(:

更新1

import math
class PositionalEncoding(nn.Module):
def __init__(self, d_model, dropout=0.1, max_len=5000):
super(PositionalEncoding, self).__init__()
self.dropout = nn.Dropout(p=dropout)
self.d_model = d_model
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)
self.register_buffer('pe', pe)
def forward(self, x, seq_len = 768, mask = None):
pos_emb = self.pe[:, :seq_len]
x = x * mask[:, :, None].float()
x = x + pos_emb
return x

如何添加变压器的问题属于以下类别:

class CamemBERTQA(nn.Module):
def __init__(self,bert_type, hidden_size, num_labels, num_inter_layers=1, heads = 12, do_lower_case = True):
super(CamemBERTQA, self).__init__()
self.do_lower_case = do_lower_case
self.bert_type = bert_type
self.hidden_size = hidden_size
self.num_labels = num_labels       
self.num_inter_layers = num_inter_layers
self.camembert = CamembertModel.from_pretrained(self.bert_type)
# ---------------- Transformer ------------------------------------------
self.d_model = self.hidden_size # 768
dropout = 0.1
self.pos_emb = PositionalEncoding(d_model = self.d_model, dropout = dropout)
self.transformer_inter = nn.ModuleList(
[nn.TransformerEncoderLayer(d_model = self.d_model, nhead = heads, dim_feedforward = 2048, dropout = dropout)
for _ in range(num_inter_layers)])
# ---------------- Transformer ------------------------------------------
self.qa_outputs = nn.Linear(self.hidden_size, self.num_labels)

def forward(self, input_ids, mask=None):
bert_output = self.camembert(input_ids = input_ids) # input_ids is a tensor
# ---------------- Transformer ------------------------------------------
seq_len = self.hidden_size
x = self.pos_emb(x = bert_output, seq_len = seq_len, mask = None)
for i in range(self.num_inter_layers):
x = self.transformer_inter[i](i, x, x, 1 - mask)  # all_tokens * max_tokens * dim
output = self.layer_norm(x)
# ---------------- Transformer ------------------------------------------
sequence_output = output[0]
logits = self.qa_outputs(sequence_output)
start_logits, end_logits = logits.split(1, dim=-1)
start_logits = start_logits.squeeze(-1)
end_logits = end_logits.squeeze(-1)
outputs = (start_logits, end_logits,)
return x

非常感谢。

因此,您似乎正在尝试在BERT组件的顶部添加一个Transformer网络。必须提到的是,自注意网络只是变压器网络的一部分,这意味着变压器除了自注意之外还有其他组件。我建议使用Transformer(包括自注意组件(作为编码器,接收BERT向量并将其转换为另一种表示(在另一个空间中(。

试试这个而不是self.attention = MultiHeadAttention():

self.transformer_inter = nn.ModuleList(
[TransformerEncoderLayer(d_model, heads, d_ff, dropout)
for _ in range(num_inter_layers)])

然后在forward()中,通过一个循环调用self.transformer_inter,该循环将为您提供Transformer架构生成的表示。像这样:

def forward(self, bert_output, mask):
batch_size, seq_len = bert_output.size(0), bert_output.size(1)
# Transformer Encoder
pos_emb = self.pos_emb.pe[:, :seq_len]
x = bert_output * mask[:, :, None].float()
x = x + pos_emb
for i in range(self.num_inter_layers):
x = self.transformer_inter[i](i, x, x, 1 - mask)  # all_tokens * max_tokens * dim
x = self.layer_norm(x) # Transformer also normalizes the outputs from each layer.
# x is the encoded vectors by Transformer encoder
return x

然后使用nn.Linear(.)层,执行另一个转换,将hidden_size映射到任务的标签数量,这将为每个标签提供logits。这些都应该在您发布的BERT类中完成。

注意TransformerEncoderLayer是我上面使用的占位符类。因此,您必须要么实现它,要么使用开源软件包。由于变形金刚是众所周知的,我认为你不会有困难找到它的实现

最新更新