如何在训练后存储Phrase trigrams gensim模型



我想知道在对句子进行训练后,我可以存储gensim Phrase模型吗

documents = ["the mayor of new york was there", "human computer interaction and 
machine learning has now become a trending research area","human computer interaction 
is interesting","human computer interaction is a pretty interesting subject", "human 
computer interaction is a great and new subject", "machine learning can be useful 
sometimes","new york mayor was present", "I love machine learning because it is a new 
subject area", "human computer interaction helps people to get user friendly 
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")

如何物理存储trigram_transformer,以便使用pickle再次使用它?

提前感谢您的帮助。

您可以使用Gensim的原生.save()方法:

trigram_transformer.save(TRIPHRASER_PATH)

然后类似地重新加载:

reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)

(Gensim保存/加载方法通常使用Python pickle,但对于某些模型和版本转换,可能会专门处理一些属性。(

您也可以使用Python自己的pickle,它应该可以正常工作,除非/直到您尝试将一个太旧的模型加载到一个可能已经改变了Phrases模型的新版本的Gensim中。

将列表或该产后格式转换为numpy数组,并将其保存为.npy文件,易于保存和阅读,使用numpy可以在几乎所有平台(如google colab、replit…(中加载它。。。。。有关保存npy文件numpy.save((的更多详细信息,请参阅此链接

使用pickle也是一个不错的选择,但当出现编码标准差异和此类问题时,事情会变得有点棘手。

最新更新