使用Spacy加载自定义模型时出错



我正在尝试将称为'ru2'的自定义模型加载到Spacy(用于NPL处理(。

可以在那里找到:https://github.com/buriy/spacy-ru

问题是我调用函数

nlp = spacy.load('ru2')
doc = nlp(text)

我看到错误

C:ProgramDataAnaconda3libimportlib_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "C://.../nlp/src/ie/main.py", line 125, in <module>
    main(examp_dict['Poroshenko'])
  File "C://.../nlp/src/ie/main.py", line 92, in main
    nlp = spacy.load('ru2')
  File "C:ProgramDataAnaconda3libsite-packagesspacy__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "C:ProgramDataAnaconda3libsite-packagesspacyutil.py", line 133, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "C:ProgramDataAnaconda3libsite-packagesspacyutil.py", line 173, in load_model_from_path
    return nlp.from_disk(model_path)
  File "C:ProgramDataAnaconda3libsite-packagesspacylanguage.py", line 791, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "C:ProgramDataAnaconda3libsite-packagesspacyutil.py", line 630, in from_disk
    reader(path / key)
  File "C:ProgramDataAnaconda3libsite-packagesspacylanguage.py", line 781, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
  File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
  File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
  File "C:ProgramDataAnaconda3libsite-packagesspacyutil.py", line 606, in from_bytes
    msg = srsly.msgpack_loads(bytes_data)
  File "C:ProgramDataAnaconda3libsite-packagessrsly_msgpack_api.py", line 29, in msgpack_loads
    msg = msgpack.loads(data, raw=False, use_list=use_list)
  File "C:ProgramDataAnaconda3libsite-packagessrslymsgpack__init__.py", line 60, in unpackb
    return _unpackb(packed, **kwargs)
  File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'

我在互联网中搜索类似问题:

  • https://github.com/explosion/spacy/issues/2715
  • https://spacy.io/usage#unhashable-list

但这些解决方案中的不适合我。

我使用

  • msgpack == 0.5.6(甚至按照上面的链接中建议的降级(
  • spacy == 2.1.4

这是来自https://spacy.io/usage#troubleshooting

如果您正在训练模型,将它们写入磁盘并使用git进行版本化,则在尝试将它们加载到Windows环境中时可能会遇到此错误。之所以发生这种情况,是因为为Windows的默认安装GIT安装被配置为在文件结帐期间自动将UNIX风格的端端字符(LF(转换为Windows风格的字符(LF((CRLF((以及合作时的反向(。虽然这对于文本文件大多是可以的,但是写给磁盘的训练有素的模型具有一些不应通过此转换的二进制文件。当他们这样做时,您会收到上面的错误。您可以通过将核心设置更改为" false",或将.gitAttributes文件[autocrlf设置]修复。像路径/to/spacy/model/** -Text这样的条目。完成其中的任何一个后,再次克隆您的存储库。

可能是因为用于生成模型的SPACY的版本数与已安装的Spacy版本不同。(当然,我不知道,只要在有帮助的情况下提及它。(

添加到上面的答案中,另一个快速修复是从存储库中手动下载zip。

最新更新