我是huggingface库的新手,正在尝试运行一个模型来执行掩码语言("填充掩码"任务(:
from transformers import BertTokenizer, BertForMaskedLM
import torch
from transformers import pipeline, AutoTokenizer, AutoModel
# Initialize MLM pipeline
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
print(len(tokenizer.vocab))
>>> 28996
但当我试图获得代币的概率时,我得到了一个错误:
classifier = pipeline("fill-mask", model=model, tokenizer=tokenizer)
results = classifier("Paris is the [MASK] of France.")
>>>KeyError Traceback (most recent call last)
<ipython-input-15-30c429f29424> in <module>()
1 classifier = pipeline("fill-mask", model=model, tokenizer=tokenizer)
----> 2 results = classifier("Paris is the [MASK] of France.")
4 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
2041 if isinstance(k, str):
2042 inner_dict = {k: v for (k, v) in self.items()}
-> 2043 return inner_dict[k]
2044 else:
2045 return self.to_tuple()[k]
KeyError: 'logits'
我还尝试了不同教程中的以下内容,但得到了相同的错误:
mlm = pipeline('fill-mask', model=model, tokenizer=tokenizer)
# Get mask token
mask = mlm.tokenizer.mask_token
# Get result for particular masked phrase
phrase = f'Paris is the [MASK] of France.'
result = mlm(phrase, top_k=10000)
# Print result
print(result)
您以错误的方式使用管道。您应该只将model_name赋予模型参数。应该是这样的:
model_name = "emilyalsentzer/Bio_ClinicalBERT"
classifier = pipeline("fill-mask", model=model_name, tokenizer=tokenizer)
results = classifier("Paris is the [MASK] of France.")
如果你想看看的结果
for i in range(len(results)):
print(f"the {i}-th result={results[i]['token_str']} has score {results[i]['score']}")
将是
the 0-th result=cause has score 0.1672661453485489
the 1-th result=site has score 0.14680784940719604
the 2-th result=source has score 0.12052636593580246
the 3-th result=area has score 0.07053395360708237
the 4-th result=sign has score 0.05601896718144417
所以我不确定你使用的模型是否是预测MASK部分的好选择。