基于WordNet的相似度的最高得分



一些相似性得分在0到1之间,例如最短路径和WuP。因此,汽车和汽车之间的相似性将是1,但其他指标,如LCh将是

lch( car, automobile ) = 3.6889

我想知道这些措施的最高分数。3.6889被认为是最大值吗?这意味着LCH得分在0到3.6889之间吗。

我添加了以下措施

jcn( car, automobile ) = 12876699.5
res( car, automobile ) = 9.3679
lesk( car, automobile ) = 9519 

似乎3.6375861597263857是lch_similarity的最大值(我无法获得3.6889…)。根据文档,lch_similarity具有以下属性:

Leacock Chodorow Similarity:
        Return a score denoting how similar two word senses are, based on the
        shortest path that connects the senses (as above) and the maximum depth
        of the taxonomy in which the senses occur. The relationship is given as
        -log(p/2d) where p is the shortest path length and d is the taxonomy
        depth.
...
:return: A score denoting the similarity of the two ``Synset`` objects,
            normally greater than 0. None is returned if no connecting path
            could be found. If a ``Synset`` is compared with itself, the
            maximum score is returned, which varies depending on the taxonomy
            depth.

假设rock_hind.n.01在WordNet分类法中处于最深级别(19),而change.n.06处于最浅级别(2),我们可以用不同的深度进行实验:

>>> from nltk.corpus import wordnet as wn
>>> rock = wn.synset('rock_hind.n.01')
>>> change = wn.synset('change.n.06')
>>> rock.lch_similarity(rock)
3.6375861597263857
>>> change.lch_similarity(change)
3.6375861597263857
>>> change.lch_similarity(rock)
0.7472144018302211
>>> rock.lch_similarity(change)
0.7472144018302211

类似的实验也可以用于其他测量,其中的范围似乎要大得多:

>>> from nltk.corpus import wordnet_ic, genesis
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
>>> genesis_ic = wn.ic(genesis, False, 0.0)
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown
1e+300
>>> rock.res_similarity(change, brown_ic)
-0.0
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor
1e+300
>>> rock.res_similarity(change, semcor_ic)
-0.0
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis
1e+300
>>> rock.res_similarity(change, genesis_ic)
-0.08306855877006339
>>> change.res_similarity(rock, genesis_ic)
-0.08306855877006339
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis
1e+300
>>> rock.jcn_similarity(change, brown_ic)
1e-300
>>> change.jcn_similarity(rock, brown_ic)
1e-300

相关内容

  • 没有找到相关文章

最新更新