无法加载在Gensim中训练的模型-pickle相关错误



当试图在Windows机器上加载Gensim训练的word2vec模型时,我收到以下错误:

AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>

我过去曾在这个系统上成功地用Gensim训练过许多模型。唯一的变化是这一次我将model.build_vocab()model.train()阶段分开;每个时代的时间黑客。我还为vocab构建和训练短语使用了不同的迭代器,但在相同的数据集上使用了相同的标记化管道。

以下是我如何进行历元进度跟踪/保存:

class EpochProgress(CallbackAny2Vec):
'''saves the model after each epoch'''
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.epoch = 0
self.start_time = time.time()
def on_epoch_begin(self, model):
print("epoch #{} started".format(self.epoch))
def on_epoch_end(self, model):
print("epoch #{} completed".format(self.epoch))
passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
print("{} hours have passed".format(str(passed)))
output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
model.save(output_path)
print("model saved at: {}".format(output_path))
self.epoch +=1

epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')

然后,我用vocab构建加载基线模型,并设置一些参数:

model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers 
model.callbacks = [epoch_progress]

然后我做这样的训练:

model.train(corpus, total_examples=model.corpus_count, epochs=epochs)

最后,像这样保存最终产品:

model.save('E:/w2v_models/trained/{}'.format(new_model_filename))

训练似乎工作正常,模型保存如预期-不幸的是,现在我无法加载它。

以下是完整的调试读数:

> AttributeError                            Traceback (most recent call
> last)
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsword2vec.py
> in load(cls, *args, **kwargs)    1329         try:
> -> 1330             model = super(Word2Vec, cls).load(*args, **kwargs)    1331 
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsbase_any2vec.py
> in load(cls, *args, **kwargs)    1243         """
> -> 1244         model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)    1245         if not hasattr(model,
> 'ns_exponent'):
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsbase_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
>     602         """
> --> 603         return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
>     604 
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimutils.py in
> load(cls, fname, mmap)
>     425 
> --> 426         obj = unpickle(fname)
>     427         obj._load_specials(fname, mmap, compress, subname)
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimutils.py in
> unpickle(fname)    1383         if sys.version_info > (3, 0):
> -> 1384             return _pickle.load(f, encoding='latin1')    1385         else:
> 
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
> 
> During handling of the above exception, another exception occurred:
> 
> AttributeError                            Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
>       3 
>       4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsword2vec.py
> in load(cls, *args, **kwargs)    1339             logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.')    1340             from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341             return load_old_word2vec(*args, **kwargs)    1342     1343 
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedword2vec.py
> in load_old_word2vec(*args, **kwargs)
>     170 
>     171 def load_old_word2vec(*args, **kwargs):
> --> 172     old_model = Word2Vec.load(*args, **kwargs)
>     173     vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
>     174     params = {
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedword2vec.py
> in load(cls, *args, **kwargs)    1639     @classmethod    1640     def
> load(cls, *args, **kwargs):
> -> 1641         model = super(Word2Vec, cls).load(*args, **kwargs)    1642         # update older models    1643         if hasattr(model,
> 'table'):
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedold_saveload.py
> in load(cls, fname, mmap)
>      85         compress, subname = SaveLoad._adapt_by_suffix(fname)
>      86 
> ---> 87         obj = unpickle(fname)
>      88         obj._load_specials(fname, mmap, compress, subname)
>      89         logger.info("loaded %s", fname)
> 
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedold_saveload.py
> in unpickle(fname)
>     377             b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
>     378         if sys.version_info > (3, 0):
> --> 379             return _pickle.loads(file_bytes, encoding='latin1')
>     380         else:
>     381             return _pickle.loads(file_bytes)
> 
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'>

Python pickle/unpickling在保存代码块或保存前定义的类/实例时可能会遇到问题,但在加载时可能不可用。(特别是,匿名或全局作用域类型不是从显式路径导入的。(

这是gensim模型保存的一个已知问题,未来的版本可能会完全避免在模型中存储这样的回调代码。(相反,每次使用回调执行方法时,都必须指定回调,并且它们只对那一次调用有效。(

有关更多详细信息,请参阅gensim项目问题#2136,其中包括一个似乎有助于其他人重新加载其模型的解决方法:确保在尝试加载的地方定义/导入相同的EpochProgress类。

最新更新