为什么NLTK的Text.similar()返回None？

现在我正在使用nltk的similar（）方法。但未按预期工作。请参阅下面的代码段：

from nltk import word_tokenize;
import nltk;
text = """
The girl is very pretty.
""";
text = nltk.Text(word_tokenize(text));
text.similar('beautiful');  #it returns "no matches" but pretty is synonym of beautiful.

我是否使用了错误的方法？还是还有其他的？请帮助我。

NLTK Text class' similar() 方法使用分布相似性。

该方法的help()指出：

similar(word, num=20) method of nltk.text.Text instance
    Distributional similarity: find other words which appear in the
    same contexts as the specified word; list most similar words first.

查看源代码，similar()使用 ContextIndex 类的实例化来查找具有类似语义窗口的单词。默认情况下，它使用 +/- 1 字窗口。

如果我们用额外的词扩展您的示例，为"漂亮"和"美丽"提供类似的语义窗口，我们将得到您正在寻找的结果。

from nltk import word_tokenize
import nltk
text = "The girl is pretty isn't she? The girl is beautiful isn't she?"
text = nltk.Text(word_tokenize(text))
text.similar('pretty')
# prints beautiful

因此，您似乎需要在文本中有更多的上下文才能给出有意义的结果。

相关内容

最新更新

热门标签：