错误训练ELMo-RuntimeError:张量a(5158)的大小必须与非单例维度1的张量b(5000)的大小匹配



我正在尝试在AllenNLP上训练我自己的自定义ELMo模型。

训练模型时会出现以下错误RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1。在某些情况下,张量a的大小被声明为其他值(例如5300(。当我在一小部分文件上进行测试时,我能够成功地训练模型。

根据我的直觉,这是关于我的模型中代币数量的问题。更具体地说,令牌超过5000的特定文件。然而,AllenNLP包中没有任何参数允许我对此进行调整以绕过此错误。

关于如何克服这个问题,有什么建议吗?调整PyTorch代码以将其设置为5000大小是否可行(如果是,我该怎么做(?任何见解都将不胜感激。

仅供参考,我目前正在使用自定义的DatasetReader进行标记化。在训练模型之前(为了节省一些时间(,我已经生成了自己的vocb列表,该列表用于通过AllenLP训练ELMo模型。

更新:我发现AllenNLPmax_len=5000中有这个变量,这就是显示错误的原因。请参阅此处的代码。我已经将参数调整为更大的值,并在许多情况下导致CUDA内存不足错误。让我相信这不应该被触碰。

环境:Python 3.6.9,Linux Ubuntu,allennlp=22.9.1,allennlp models=22.9.0

追溯:

Traceback (most recent call last):
File "/home/jiayi/.local/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
args.func(args)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
metrics = train_loop.run()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
return self.trainer.train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
metrics, epoch = self._try_train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
train_metrics = self._train_epoch(epoch)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
embeddings, mask
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
token_embeddings = self._position(token_embeddings)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
return x + self.positional_encoding[:, : x.size(1)]
RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1

AllenNLP训练配置文件:

// For more info on config files generally, see https://guide.allennlp.org/using-config-files
local NUM_GRAD_ACC = 4;
local BATCH_SIZE = 1;
local BASE_LOADER = {
"max_instances_in_memory": 8,
"batch_sampler": {
"type": "bucket",
"batch_size": BATCH_SIZE,
"sorting_keys": ["source"]
}
};
{
"dataset_reader" : {
"type": "mimic_reader",
"token_indexers": {
"tokens": {
"type": "single_id"
},
"token_characters": {
"type": "elmo_characters"
}
},
"start_tokens": ["<S>"],
"end_tokens": ["</S>"],
},
"train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
// Note: We don't set a validation_data_path because the softmax is only
// sampled during training. Not sampling on GPUs results in a certain OOM
// given our large vocabulary. We'll need to evaluate against the test set
// (when we'll want a full softmax) with the CPU.
"vocabulary": {
// Use a prespecified vocabulary for efficiency.
"type": "from_files",
"directory": std.extVar("ELMO_VOCAB_PATH"),
// Plausible config for generating the vocabulary.
// "tokens_to_add": {
//     "tokens": ["<S>", "</S>"],
//     "token_characters": ["<>/S"]
// },
// "min_count": {"tokens": 3}
},
"model": {
"type": "language_model",
"bidirectional": true,
"num_samples": 8192,
# Sparse embeddings don't work with DistributedDataParallel.
"sparse_embeddings": false,
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "empty"
},
"token_characters": {
"type": "character_encoding",
"embedding": {
"num_embeddings": 262,
// Same as the Transformer ELMo in Calypso. Matt reports that
// this matches the original LSTM ELMo as well.
"embedding_dim": 16
},
"encoder": {
"type": "cnn-highway",
"activation": "relu",
"embedding_dim": 16,
"filters": [
[1, 32],
[2, 32],
[3, 64],
[4, 128],
[5, 256],
[6, 512],
[7, 1024]],
"num_highway": 2,
"projection_dim": 512,
"projection_location": "after_highway",
"do_layer_norm": true
}
}
}
},
// Consider the following.
// remove_bos_eos: true,
// Applies to the contextualized embeddings.
"dropout": 0.1,
"contextualizer": {
"type": "bidirectional_language_model_transformer",
"input_dim": 512,
"hidden_dim": 4096,
"num_layers": 2,
"dropout": 0.1,
"input_dropout": 0.1
}
},
"data_loader": BASE_LOADER,
// "distributed": {
//     "cuda_devices": [0, 1],
// },
"trainer": {
"num_epochs": 10,
"cuda_devices": [0, 1, 2, 3],
"optimizer": {
// The gradient accumulators in Adam for the running stdev and mean for
// words not used in the sampled softmax would be decayed to zero with the
// standard "adam" optimizer.
"type": "dense_sparse_adam"
},
// "grad_norm": 10.0,
"learning_rate_scheduler": {
"type": "noam",
// See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
"model_size": 512,
// See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
// Adjusted based on our sample size relative to Calypso's.
"warmup_steps": 6000
},
"num_gradient_accumulation_steps": NUM_GRAD_ACC,
"use_amp": true
}
}

通过将自定义DatasetReader的max_tokens变量设置为5000以下,此错误将不再存在。AllenNLP的一位贡献者也提出了这一建议,以确保令牌化器将输入截断为5000个令牌。

AllenNLP上也发布了同样的问题:https://github.com/allenai/allennlp/discussions/5601

最新更新