拥抱脸 |管道异常:<mask>在输入上找不到mask_token ()



目标:在多个modelsprint()运行时间内达到for-loop

处理一个模型效果良好:

i=0
start = time.time()
unmasker = pipeline('fill-mask', model=models[i])
unmasker("Hello I'm a [MASK] model.", top_k=1)
end = time.time() 
df = df.append({'Model': models[i], 'Time': end-start}, ignore_index=True)
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

但是,对许多模型名称进行迭代会导致标题错误。


代码:

from transformers import pipeline
import time
models = ['bert-base-uncased', 'roberta-base', 'distilbert-base-uncased', 'bert-base-cased', 'albert-base-v2', 'roberta-large', 'bert-large-uncased albert-large-v2', 'albert-base-v2', 'bert-large-cased', 'albert-base-v1', 'bert-large-cased-whole-word-masking', 'bert-large-uncased-whole-word-masking', 'albert-xxlarge-v2', 'google/bigbird-roberta-large', 'albert-xlarge-v2', 'albert-xxlarge-v1', 'facebook/muppet-roberta-large', 'facebook/muppet-roberta-base', 'albert-large-v1', 'albert-xlarge-v1']
for _model in models:
start = time.time()
unmasker = pipeline('fill-mask', model=_model)
unmasker("Hello I'm a [MASK] model.", top_k=1)  # default: top_k=5
end = time.time()
print(end-start)
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
---------------------------------------------------------------------------
PipelineException                         Traceback (most recent call last)
<ipython-input-19-13b5f651657e> in <module>
3     start = time.time()
4     unmasker = pipeline('fill-mask', model=_model)
----> 5     unmasker("Hello I'm a [MASK] model.", top_k=1)  # default: top_k=5
6     end = time.time()
7 
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/fill_mask.py in __call__(self, inputs, *args, **kwargs)
224             - **token** (`str`) -- The predicted token (to replace the masked one).
225         """
--> 226         outputs = super().__call__(inputs, **kwargs)
227         if isinstance(inputs, list) and len(inputs) == 1:
228             return outputs[0]
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/base.py in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
1099                 return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
1100         else:
-> 1101             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
1102 
1103     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/base.py in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
1105 
1106     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
-> 1107         model_inputs = self.preprocess(inputs, **preprocess_params)
1108         model_outputs = self.forward(model_inputs, **forward_params)
1109         outputs = self.postprocess(model_outputs, **postprocess_params)
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/fill_mask.py in preprocess(self, inputs, return_tensors, **preprocess_parameters)
82             return_tensors = self.framework
83         model_inputs = self.tokenizer(inputs, return_tensors=return_tensors)
---> 84         self.ensure_exactly_one_mask_token(model_inputs)
85         return model_inputs
86 
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/fill_mask.py in ensure_exactly_one_mask_token(self, model_inputs)
76         else:
77             for input_ids in model_inputs["input_ids"]:
---> 78                 self._ensure_exactly_one_mask_token(input_ids)
79 
80     def preprocess(self, inputs, return_tensors=None, **preprocess_parameters) -> Dict[str, GenericTensor]:
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/pipelines/fill_mask.py in _ensure_exactly_one_mask_token(self, input_ids)
67                 "fill-mask",
68                 self.model.base_model_prefix,
---> 69                 f"No mask_token ({self.tokenizer.mask_token}) found on the input",
70             )
71 
PipelineException: No mask_token (<mask>) found on the input

请让我知道,如果还有什么我可以添加到帖子中澄清。

只有某些模型会抛出该错误。

由于我正在试验任何模型的运行时,以下内容就足够了。我成功地管理了大多数模特。

我应用了try except逻辑。注意,处理异常而不在except语句中专门命名错误被认为是一种糟糕的做法。

for _model in models:
for i in range(10):
start = time.time()
try:
unmasker = pipeline('fill-mask', model=_model)
unmasker("Hello I'm a [MASK] model.", top_k=1)  # default: top_k=5
print(_model)
except: continue
end = time.time()
df = df.append({'Model': _model, 'Time': end-start}, ignore_index=True)
print(df)
df.to_csv('model_performance.csv', index=False)

最新更新