我想从python中的两个嵌入文档中获得一个语义相似的单词列表

我正在python中进行文本嵌入。在哪里我发现了两个文档与Doc2vec模型之间的相似性。代码如下：

for doc_id in range(len(train_corpus)):
inferred_vector = model.infer_vector(train_corpus[doc_id].words) # it takes each document words as a input and produce vector of each document
sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs)) # it takes list of all document's vector as a input and compare those with the trained vectors and gives the most similarity of 1st document to other and then second to other and so on .
print('Document ({}): «{}»n'.format(doc_id, ' '.join(train_corpus[doc_id].words)))
print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:n' % model)
for label, index in [('MOST', 0), ('SECOND-MOST', 1), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:
print(u'%s %s: «%s»n' % (label, sims[index], ' '.join(train_corpus[sims[index][0]].words)))

现在，从这两个嵌入的文档中，我如何提取这些特定文档的一组语义相似的单词。

请帮帮我。

只有一些Doc2Vec模式也训练单词向量：dm=1(默认值(或dm=0, dbow_words=1(DBOW文档向量，但添加了skip gram单词向量。如果您使用过这样的模式，那么model.wv属性中会有单词向量。

对model.wv.similarity(word1, word2)方法的调用将为您提供任意两个单词的成对相似性。

因此，您可以迭代doc1中的所有单词，然后收集doc2中每个单词的相似性，并报告每个单词的单个最高相似性。

相关内容

最新更新

热门标签：