我想知道在对句子进行训练后,我可以存储gensim Phrase模型吗
documents = ["the mayor of new york was there", "human computer interaction and
machine learning has now become a trending research area","human computer interaction
is interesting","human computer interaction is a pretty interesting subject", "human
computer interaction is a great and new subject", "machine learning can be useful
sometimes","new york mayor was present", "I love machine learning because it is a new
subject area", "human computer interaction helps people to get user friendly
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")
如何物理存储trigram_transformer,以便使用pickle再次使用它?
提前感谢您的帮助。
您可以使用Gensim的原生.save()
方法:
trigram_transformer.save(TRIPHRASER_PATH)
然后类似地重新加载:
reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)
(Gensim保存/加载方法通常使用Python pickle,但对于某些模型和版本转换,可能会专门处理一些属性。(
您也可以使用Python自己的pickle,它应该可以正常工作,除非/直到您尝试将一个太旧的模型加载到一个可能已经改变了Phrases
模型的新版本的Gensim中。
将列表或该产后格式转换为numpy数组,并将其保存为.npy文件,易于保存和阅读,使用numpy可以在几乎所有平台(如google colab、replit…(中加载它。。。。。有关保存npy文件numpy.save((的更多详细信息,请参阅此链接
使用pickle也是一个不错的选择,但当出现编码标准差异和此类问题时,事情会变得有点棘手。