如何根据不直接出现在句子中的关键词找到相似的句子

我需要返回一个包含关键字的文本。让我们考虑以下示例：

keyword = "configure"
texts = [ 
"The system configuration document should be uploaded to the repository. Please contact the dev team.",
"To do the system setup, please follow the instructions." 
]

关键字configure不会出现在任何文本中。但是类似的单词configuration出现在第一句中。因此，预期输出为：

The system configuration document should be uploaded to the repository. Please contact the dev team.

我知道计算[单词和文本之间的语义相似性]是可能的[1]。然而，对于我的案例，它经常返回不准确的结果。

我正在评估的另一种方法是应用词干和引理化。然而，configure和configuration具有不同的茎。

最后还考虑了Word2Vec模型。。。然而，在这种情况下，我不确定如何有效地使用这种方法。

import gensim.downloader as api
word_vectors = api.load("glove-wiki-gigaword-100") 
word_vectors.similarity("configure","configuration")

有没有最先进的方法来处理我的任务？[1] ：https://medium.com/@adrienseg/text-similities-da019229c894

如果你的句子长度不太长，你可以尝试对句子中的单词向量求和，然后搜索你的关键词和这个和之间的相似性。

否则，你可以尝试从句子中提取关键词，然后对它们的向量求和，以搜索最接近你的关键词。

相关内容

最新更新

热门标签：