当试图在Windows机器上加载Gensim训练的word2vec模型时,我收到以下错误:
AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>
我过去曾在这个系统上成功地用Gensim训练过许多模型。唯一的变化是这一次我将model.build_vocab()
和model.train()
阶段分开;每个时代的时间黑客。我还为vocab构建和训练短语使用了不同的迭代器,但在相同的数据集上使用了相同的标记化管道。
以下是我如何进行历元进度跟踪/保存:
class EpochProgress(CallbackAny2Vec):
'''saves the model after each epoch'''
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.epoch = 0
self.start_time = time.time()
def on_epoch_begin(self, model):
print("epoch #{} started".format(self.epoch))
def on_epoch_end(self, model):
print("epoch #{} completed".format(self.epoch))
passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
print("{} hours have passed".format(str(passed)))
output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
model.save(output_path)
print("model saved at: {}".format(output_path))
self.epoch +=1
epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')
然后,我用vocab构建加载基线模型,并设置一些参数:
model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers
model.callbacks = [epoch_progress]
然后我做这样的训练:
model.train(corpus, total_examples=model.corpus_count, epochs=epochs)
最后,像这样保存最终产品:
model.save('E:/w2v_models/trained/{}'.format(new_model_filename))
训练似乎工作正常,模型保存如预期-不幸的是,现在我无法加载它。
以下是完整的调试读数:
> AttributeError Traceback (most recent call
> last)
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsword2vec.py
> in load(cls, *args, **kwargs) 1329 try:
> -> 1330 model = super(Word2Vec, cls).load(*args, **kwargs) 1331
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsbase_any2vec.py
> in load(cls, *args, **kwargs) 1243 """
> -> 1244 model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs) 1245 if not hasattr(model,
> 'ns_exponent'):
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsbase_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
> 602 """
> --> 603 return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
> 604
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimutils.py in
> load(cls, fname, mmap)
> 425
> --> 426 obj = unpickle(fname)
> 427 obj._load_specials(fname, mmap, compress, subname)
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimutils.py in
> unpickle(fname) 1383 if sys.version_info > (3, 0):
> -> 1384 return _pickle.load(f, encoding='latin1') 1385 else:
>
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
>
> During handling of the above exception, another exception occurred:
>
> AttributeError Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
> 3
> 4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsword2vec.py
> in load(cls, *args, **kwargs) 1339 logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.') 1340 from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341 return load_old_word2vec(*args, **kwargs) 1342 1343
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedword2vec.py
> in load_old_word2vec(*args, **kwargs)
> 170
> 171 def load_old_word2vec(*args, **kwargs):
> --> 172 old_model = Word2Vec.load(*args, **kwargs)
> 173 vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
> 174 params = {
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedword2vec.py
> in load(cls, *args, **kwargs) 1639 @classmethod 1640 def
> load(cls, *args, **kwargs):
> -> 1641 model = super(Word2Vec, cls).load(*args, **kwargs) 1642 # update older models 1643 if hasattr(model,
> 'table'):
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedold_saveload.py
> in load(cls, fname, mmap)
> 85 compress, subname = SaveLoad._adapt_by_suffix(fname)
> 86
> ---> 87 obj = unpickle(fname)
> 88 obj._load_specials(fname, mmap, compress, subname)
> 89 logger.info("loaded %s", fname)
>
> C:anacondaenvsmega_WOSlibsite-packagesgensimmodelsdeprecatedold_saveload.py
> in unpickle(fname)
> 377 b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
> 378 if sys.version_info > (3, 0):
> --> 379 return _pickle.loads(file_bytes, encoding='latin1')
> 380 else:
> 381 return _pickle.loads(file_bytes)
>
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'>
Python pickle/unpickling在保存代码块或保存前定义的类/实例时可能会遇到问题,但在加载时可能不可用。(特别是,匿名或全局作用域类型不是从显式路径导入的。(
这是gensim模型保存的一个已知问题,未来的版本可能会完全避免在模型中存储这样的回调代码。(相反,每次使用回调执行方法时,都必须指定回调,并且它们只对那一次调用有效。(
有关更多详细信息,请参阅gensim项目问题#2136,其中包括一个似乎有助于其他人重新加载其模型的解决方法:确保在尝试加载的地方定义/导入相同的EpochProgress
类。