我有一个PyTorch模型,由一个蒸馏器和一个具有以下结构的BiLSTM组成。其目的是通过将变压器的输出连接到BiLSTM的输入,对大量类别(num_labels=1182
)执行令牌分类。
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoModelForTokenClassification
import utilities as utils
from global_constants import MAX_DOC_LENGTH
class CustomTorchModel(nn.Module):
def __init__(self, args_model_name_or_path):
id_to_label, label_to_id = utils.unshelve_label_converters()
label_qty = len(list(label_to_id))
self.distilbert_layer = AutoModelForTokenClassification.from_pretrained(
args_model_name_or_path,
id2label=id_to_label,
label2id=label_to_id,
num_labels=label_qty
)
self.bilstm_layer = nn.LSTM(input_size=MAX_DOC_LENGTH,
hidden_size=self.distilbert_layer.config.dim,
num_layers=1,
batch_first=True,
bidirectional=True)
def forward(self, inputs):
print("input_ids size: " + str(inputs[0].size()))
print("attention_mask size: " + str(inputs[1].size()))
distilbert_output = self.distilbert_layer(input_ids=inputs[0], attention_mask=inputs[1])
print("distilbert_output.last_hidden_state size: " + str(distilbert_output.last_hidden_state.size()))
bilstm_output, (last_hidden, last_cell) = self.bilstm_layer(distilbert_output.last_hidden_state)
print("BiLSTM output size: " + str(bilstm_output.size()))
output = self.classification_layer(bilstm_output)
print("output size: " + str(output.size()))
return F.softmax(output)
输出显示每层后的形状。注:256为MAX_DOC_LENGTH
, 768为self.distilbert_layer.config.dim
, 1182为num_labels
。
input_ids size: torch.Size([8, 256])
attention_mask size: torch.Size([8, 256])
distilbert_output.last_hidden_state size: torch.Size([8, 256, 768])
BiLSTM output size: torch.Size([8, 256, 1536])
output size: torch.Size([8, 256, 1182])
这个自定义模型在一个非常标准的Ignite脚本中使用,该脚本利用来训练模型。由于存在多个类别,并且这不是二元分类,因此损失函数应为nn.CrossEntropyLoss
:
criterion = nn.CrossEntropyLoss(reduction='mean')
optimizer = AdamW(model.parameters(), lr=1e-5)
lr_scheduler = ExponentialLR(optimizer, gamma=0.90)
trainer = create_supervised_trainer1(model.to(device), optimizer, criterion, device=device)
这是上面所用方法的定义:
def _prepare_batch(batch, device=None, non_blocking=False):
x = [batch["input_ids"], batch["attention_mask"]] # list
y = batch["labels"]
return (convert_tensor(x, device=device, non_blocking=non_blocking),
convert_tensor(y, device=device, non_blocking=non_blocking))
def create_supervised_trainer1(model, optimizer, loss_fn, metrics={}, device=None):
def _update(engine, batch):
model.train()
optimizer.zero_grad()
x, y = _prepare_batch(batch, device=device)
y_pred = model(x)
transposed_y_pred = torch.transpose(y_pred, 1, 2)
loss = loss_fn(transposed_y_pred, y.long())
loss.backward()
optimizer.step()
return loss.item(), transposed_y_pred, y.long()
def _metrics_transform(output):
return output[1], output[2]
engine = Engine(_update)
for name, metric in metrics.items():
metric._output_transform = _metrics_transform
metric.attach(engine, name)
return engine
我知道我错过了什么,但是我不能弄清楚是什么。执行会产生一个与形状("y"一个dataloader有[8,256],网络产生[8,1182]。即使我按照CrossEntropyLoss
要求的顺序重新排列张量,也会发生这种情况:
Current run is terminating due to exception: Expected target size [8, 1182], got [8, 256]
Engine run is terminating due to exception: Expected target size [8, 1182], got [8, 256]
Traceback (most recent call last):
File "/home/users/user/august/src/main/ignite_script.py", line 456, in run
trainer.run(train_dataloader, max_epochs=epochs)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 892, in run
return self._internal_run()
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
raise e
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 959, in _internal_run_as_gen
epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 1087, in _run_once_on_dataset_as_gen
self._handle_exception(e)
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
raise e
File "/home/users/user/.local/lib/python3.9/site-packages/ignite/engine/engine.py", line 1068, in _run_once_on_dataset_as_gen
self.state.output = self._process_function(self, self.state.batch)
File "/home/users/user/august/src/main/ignite_script.py", line 321, in _update
loss = loss_fn(y_pred, y.float())
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 1163, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/users/user/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected target size [8, 1182], got [8, 256]
根据
文档神经网络。CrossEntropyLoss,您可以通过两种方式指定目标:
- 每个样品的分类指数
- 每个样本的每个类别的概率
但是你应该通过一个特定的形状目标来做上面提到的任何一种方法:
给定:
> criterion = nn.CrossEntropyLoss()
> loss = criterion(input, target)
输入:
Shape: (C)(C), (N, C)(N,C) or (N, C, d_1, d_2, ..., d_K)(N,C,d1,d2,...,dK) with K≥1 in the case of K-dimensional loss.
:
- 如果包含类索引,形状:
()(), (N)(N) or (N, d_1, d_2, ..., d_K)(N,d 1,d2,...,dK) with K≥1 in the case of K-dimensional loss where each value should be between [0, C)[0,C).
- 如果包含类概率,与输入形状相同,每个值应在[0,1][0,1]之间。
输出:
如果约简为"none",则在K维损失情况下,形状()(), (N)(N) or (N, d_1, d_2, ..., d_K)(N,d1,d2,...,dK)
的K≥1,取决于输入的形状。否则,标量。