当我有它们的向量时，如何对关键字进行聚类或获得关键字相似性

我有一个python字典，它使用Pickle方法(通过Bert-as-Service和Google的预训练模型(存储为Vector文件，如下所示：

(键(短语： (值(Phrase_Vector_from_Bert = 女士布料： 1.3237 -2.6354 1.7458 ....

但是我不知道从Bert-as-Service模型中获取短语与矢量文件的相似性，就像我对Gensim Word2Vec所做的那样，因为后者配备了.similarity方法。

您能否提供建议以获取短语/关键字的相似性或将它们与我的python-Pickle-dictionary矢量文件聚类？

或者，也许有更好的主意将关键字与Bert-as-Service进行集群？

以下代码显示了我如何获取短语/关键字的向量：

import Myutility
# the file Myutility includes the function save_model and load_model
import BertCommand
# the file Bertcommand includes the function to start Bert-as-service 
client
WORD_PATH = 'E:/Works/testwords.txt'
WORD_FEATURE = 'E:/Works/word.google.vector'
word_vectors = {}
with open(WORD_PATH) as f:
lines = f.readlines()
for line in lines:
line = line.strip('n')
if line:                
word = line
print(line)
word_vectors[word]=None
for word in word_vectors:
try:
v = bc.encode([word])
word_vectors[word] = v
except:
pass
save_model(word_vectors,WORD_FEATURE)

如果我理解得很好，你还没有每个短语的向量。

然后，您可以简单地计算两个短语向量之间的余弦相似性。

有关更多详细信息和实现(手动实现和sklearn实现(，我建议此链接：https://skipperkongen.dk/2018/09/19/cosine-similarity-in-python/

相关内容

最新更新

热门标签：