我在情绪分析管道中使用默认模型没有问题。
# Allocate a pipeline for sentiment-analysis
nlp = pipeline('sentiment-analysis')
nlp('I am a black man.')
>>>[{'label': 'NEGATIVE', 'score': 0.5723695158958435}]
但是,当我尝试通过添加特定模型来稍微自定义管道时。它抛出一个KeyError。
nlp = pipeline('sentiment-analysis',
tokenizer = AutoTokenizer.from_pretrained("DeepPavlov/bert-base-cased-conversational"),
model = AutoModelWithLMHead.from_pretrained("DeepPavlov/bert-base-cased-conversational"))
nlp('I am a black man.')
>>>---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-55-af7e46d6c6c9> in <module>
3 tokenizer = AutoTokenizer.from_pretrained("DeepPavlov/bert-base-cased-conversational"),
4 model = AutoModelWithLMHead.from_pretrained("DeepPavlov/bert-base-cased-conversational"))
----> 5 nlp('I am a black man.')
6
7
~/opt/anaconda3/lib/python3.7/site-packages/transformers/pipelines.py in __call__(self, *args, **kwargs)
721 outputs = super().__call__(*args, **kwargs)
722 scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
--> 723 return [{"label": self.model.config.id2label[item.argmax()], "score": item.max().item()} for item in scores]
724
725
~/opt/anaconda3/lib/python3.7/site-packages/transformers/pipelines.py in <listcomp>(.0)
721 outputs = super().__call__(*args, **kwargs)
722 scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
--> 723 return [{"label": self.model.config.id2label[item.argmax()], "score": item.max().item()} for item in scores]
724
725
KeyError: 58129
我面临着同样的问题。我正在使用XML-R的模型,该模型使用squadv2数据集("a-ware/xlmroberta-squadv2"(进行了微调。就我而言,KeyError 是 16。
链接
寻求有关该问题的帮助,我找到了此信息:链接,希望对您有所帮助。
答案(来自链接(
当模型预测不属于文档的令牌(例如,最终特殊令牌 [SEP](时,管道会引发异常
我的问题:
from transformers import XLMRobertaTokenizer, XLMRobertaForQuestionAnswering
from transformers import pipeline
nlp = pipeline('question-answering',
model = XLMRobertaForQuestionAnswering.from_pretrained('a-ware/xlmroberta-squadv2'),
tokenizer= XLMRobertaTokenizer.from_pretrained('a-ware/xlmroberta-squadv2'))
nlp(question = "Who was Jim Henson?", context ="Jim Henson was a nice puppet")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-15-b5a8ece5e525> in <module>()
1 context = "Jim Henson was a nice puppet"
2 # --------------- CON INTERROGACIONES
----> 3 nlp(question = "Who was Jim Henson?", context =context)
1 frames
/usr/local/lib/python3.6/dist-packages/transformers/pipelines.py in <listcomp>(.0)
1745 ),
1746 }
-> 1747 for s, e, score in zip(starts, ends, scores)
1748 ]
1749
KeyError: 16
解决方案 1:在上下文末尾添加标点符号
为了避免尝试提取最终令牌的错误(这可能是一个特殊的标记,如 [SEP](,我在上下文末尾添加了一个元素(在本例中为标点符号(:
nlp(question = "Who was Jim Henson?", context ="Jim Henson was a nice puppet.")
[OUT]
{'answer': 'nice puppet.', 'end': 28, 'score': 0.5742837190628052, 'start': 17}
解决方案 2:不要使用 pipeline((
原始模型可以自行处理以检索正确的令牌索引。
from transformers import XLMRobertaTokenizer, XLMRobertaForQuestionAnswering
import torch
tokenizer = XLMRobertaTokenizer.from_pretrained('a-ware/xlmroberta-squadv2')
model = XLMRobertaForQuestionAnswering.from_pretrained('a-ware/xlmroberta-squadv2')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
encoding = tokenizer(question, text, return_tensors='pt')
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
start_scores, end_scores = model(input_ids, attention_mask=attention_mask, output_attentions=False)[:2]
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
answer = tokenizer.convert_tokens_to_ids(answer.split())
answer = tokenizer.decode(answer)
更新
更详细地查看您的案例,我发现管道中对话任务的默认模型是distilbert-base-cased
(源代码(。
我发布的第一个解决方案确实不是一个好的解决方案。尝试其他问题时,我遇到了同样的错误。但是,管道外的模型本身工作正常(如我在解决方案 2 中所示(。因此,我认为并非所有模型都可以在管道中引入。 如果有人有关于它的更多信息,请帮助我们。谢谢。