我试图使用空格3添加自定义NER标签。我找到了旧版本的教程,并对space 3进行了调整。以下是我使用的全部代码:
import random
import spacy
from spacy.training import Example
LABEL = 'ANIMAL'
TRAIN_DATA = [
("Horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("Do they bite?", {'entities': []}),
("horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("horses pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
("they pretend to care about your feelings, those horses", {'entities': [(48, 54, LABEL)]}),
("horses?", {'entities': [(0, 6, LABEL)]})
]
nlp = spacy.load('en_core_web_sm') # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see, that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(20):
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
doc = nlp(text)
example = Example.from_dict(doc, annotations)
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
print(losses)
# test the trained model # add some dummy sentences with many NERs
test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
print(ent.label_, " -- ", ent.text)
这段代码输出ValueError异常,但只在2次迭代之后——注意前两行:
{'ner': 9.862242701536594}
{'ner': 8.169456698315201}
Traceback (most recent call last):
File ".custom_ner_training.py", line 46, in <module>
nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
File "C:ogrmojepythonspacy_pgmyvenvlibsite-packagesspacylanguage.py", line 1106, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacypipelinetransition_parser.pyx", line 366, in spacy.pipeline.transition_parser.Parser.update
File "spacypipelinetransition_parser.pyx", line 478, in spacy.pipeline.transition_parser.Parser.get_batch_loss
File "spacypipeline_parser_internalsner.pyx", line 310, in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError
我看到ANIMAL
标签是通过调用ner.move_names
添加的。
当我改变我的值LABEL = 'PERSON
时,代码运行成功,并在新数据上将马识别为PERSON
。这就是为什么我假设,代码本身没有错误。
我错过了什么吗?我做错了什么?有人能复制一下吗?
注意:这是我在这里的第一个问题。我希望我提供了所有的信息。如果没有,请在评论中告诉我。
您需要更改for
循环中的以下行
doc = nlp(text)
doc = nlp.make_doc(text)
代码应该工作并产生以下结果:
{'ner': 9.60289144264557}
{'ner': 8.875474230820478}
{'ner': 6.370401408220459}
{'ner': 6.687456469517201}
...
{'ner': 1.3796682589133492e-05}
{'ner': 1.7709562613218738e-05}
Entities in 'Do you like horses?'
ANIMAL -- horses
另一个潜在的原因可能是语料库中的标签信息未对齐。你可以检查训练数据中是否有额外的空格。如果需要,可以先从文本中删除多余的空格,然后计算标签在文本中的起始和结束位置。