如何在训练后存储Phrase trigrams gensim模型

我想知道在对句子进行训练后，我可以存储gensim Phrase模型吗

documents = ["the mayor of new york was there", "human computer interaction and 
machine learning has now become a trending research area","human computer interaction 
is interesting","human computer interaction is a pretty interesting subject", "human 
computer interaction is a great and new subject", "machine learning can be useful 
sometimes","new york mayor was present", "I love machine learning because it is a new 
subject area", "human computer interaction helps people to get user friendly 
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")

如何物理存储trigram_transformer，以便使用pickle再次使用它？

提前感谢您的帮助。

您可以使用Gensim的原生.save()方法：

trigram_transformer.save(TRIPHRASER_PATH)

然后类似地重新加载：

reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)

(Gensim保存/加载方法通常使用Python pickle，但对于某些模型和版本转换，可能会专门处理一些属性。(

您也可以使用Python自己的pickle，它应该可以正常工作，除非/直到您尝试将一个太旧的模型加载到一个可能已经改变了Phrases模型的新版本的Gensim中。

将列表或该产后格式转换为numpy数组，并将其保存为.npy文件，易于保存和阅读，使用numpy可以在几乎所有平台(如google colab、replit…(中加载它。。。。。有关保存npy文件numpy.save((的更多详细信息，请参阅此链接

使用pickle也是一个不错的选择，但当出现编码标准差异和此类问题时，事情会变得有点棘手。

相关内容

最新更新

热门标签：