警告:[W030]某些实体无法在文本中对齐


TRAIN_DATA = [
("XYZxyzg hat die beste Camera für Selfies", {"entities": [(0, 7, "BRAND"), (23, 28, "CAMERA")]}),
]

在训练这一点时,我在这条线上不断得到一个错误:

serWarning: [W030] Some entities could not be aligned in the text "XYZxyzg hat die beste Camera für Selfie" with entities "[(0, 7, 'BRAND'), (23, 28, 'CAMERA')]". Use `spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training.
gold = GoldParse(doc, **gold)

我的索引出了什么问题?我应该排除空白吗?我也试过了,但似乎不起作用。如何使用spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities)检查索引,如警告所示?

来自您的帖子:

TRAIN_DATA = [
("XYZxyzg hat die beste Camera für Selfies", {"entities": [(0, 7, "BRAND"), (23, 28, "CAMERA")]}),
]

实体偏移量需要与标记边界对齐。不能在令牌中间启动/结束实体。在您的情况下,它看起来像是一个小错误,我认为第二个实体的偏移量应该是(22, 28, "CAMERA")

我相信,

spacy.gold.biluo_tags_from_offsets已经被弃用。

您可以用spacy.training import offsets_to_biluo_tags替换spacy.gold import biluo_tags_from_offsets

https://spacy.io/api/top-level#offsets_to_biluo_tags

最新更新