空间: OSError: [E050] 在谷歌上找不到模型 Colab |蟒



我正在尝试使用西班牙语核心模型es_core_news_sm"lemmatize"西班牙语文本。然而,我得到了OSError

以下代码是在Google Colabs上使用SpaCy引理的示例:

import spacy
spacy.prefer_gpu()
nlp = spacy.load('es_core_news_sm')
text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
doc = nlp(text)
lemmas = [tok.lemma_.lower() for tok in doc]

我还尝试导入核心,但没有以这种方式工作,得到了类似的回溯。

import es_core_news_sm
nlp = es_core_news_sm.load()

追溯:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-93-fd65d69a4f87> in <module>()
2 spacy.prefer_gpu()
3 
----> 4 nlp = spacy.load('es_core_web_sm')
5 text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
6 doc = nlp(text)
1 frames
/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model(name, **overrides)
137     elif hasattr(name, "exists"):  # Path or Path-like to model data
138         return load_model_from_path(name, **overrides)
--> 139     raise IOError(Errors.E050.format(name=name))
140 
141 
OSError: [E050] Can't find model 'es_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

您首先需要下载数据:

!spacy download es_core_news_sm

然后重新启动运行时,之后您的代码将正确运行:

import spacy
spacy.prefer_gpu()
nlp = spacy.load('es_core_news_sm')
text = 'yo canto, tú cantas, ella canta, nosotros cantamos, cantáis, cantan…'
doc = nlp(text)
lemmas = [tok.lemma_.lower() for tok in doc]
print(len(lemmas))
16

我遇到了类似的问题,并做了以下操作。你需要这个例子的torchtext

spacy_de = spacy.load('de_core_news_sm')
spacy_en = spacy.load('en_core_web_sm')

我通过函数调用tokenizer。例如:

def tokenize_de(text):
"""
Tokenizes German text from a string into a list of strings (tokens) and reverses it
"""
return [tok.text for tok in spacy_de.tokenizer(text)][::-1]
def tokenize_en(text):
"""
Tokenizes English text from a string into a list of strings (tokens)
"""
return [tok.text for tok in spacy_en.tokenizer(text)]
---------
SRC = Field(tokenize = tokenize_de, 
init_token = '<sos>', 
eos_token = '<eos>',
fix_length = MAX_LEN, 
lower = True)
TRG = Field(tokenize = tokenize_en, 
init_token = '<sos>', 
eos_token = '<eos>', 
fix_length = MAX_LEN,
lower = True)

最新更新