一些相似性得分在0到1之间,例如最短路径和WuP。因此,汽车和汽车之间的相似性将是1,但其他指标,如LCh将是
lch( car, automobile ) = 3.6889
我想知道这些措施的最高分数。3.6889被认为是最大值吗?这意味着LCH得分在0到3.6889之间吗。
我添加了以下措施
jcn( car, automobile ) = 12876699.5
res( car, automobile ) = 9.3679
lesk( car, automobile ) = 9519
似乎3.6375861597263857是lch_similarity
的最大值(我无法获得3.6889…)。根据文档,lch_similarity
具有以下属性:
Leacock Chodorow Similarity:
Return a score denoting how similar two word senses are, based on the
shortest path that connects the senses (as above) and the maximum depth
of the taxonomy in which the senses occur. The relationship is given as
-log(p/2d) where p is the shortest path length and d is the taxonomy
depth.
...
:return: A score denoting the similarity of the two ``Synset`` objects,
normally greater than 0. None is returned if no connecting path
could be found. If a ``Synset`` is compared with itself, the
maximum score is returned, which varies depending on the taxonomy
depth.
假设rock_hind.n.01
在WordNet分类法中处于最深级别(19),而change.n.06
处于最浅级别(2),我们可以用不同的深度进行实验:
>>> from nltk.corpus import wordnet as wn
>>> rock = wn.synset('rock_hind.n.01')
>>> change = wn.synset('change.n.06')
>>> rock.lch_similarity(rock)
3.6375861597263857
>>> change.lch_similarity(change)
3.6375861597263857
>>> change.lch_similarity(rock)
0.7472144018302211
>>> rock.lch_similarity(change)
0.7472144018302211
类似的实验也可以用于其他测量,其中的范围似乎要大得多:
>>> from nltk.corpus import wordnet_ic, genesis
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
>>> genesis_ic = wn.ic(genesis, False, 0.0)
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown
1e+300
>>> rock.res_similarity(change, brown_ic)
-0.0
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor
1e+300
>>> rock.res_similarity(change, semcor_ic)
-0.0
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis
1e+300
>>> rock.res_similarity(change, genesis_ic)
-0.08306855877006339
>>> change.res_similarity(rock, genesis_ic)
-0.08306855877006339
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis
1e+300
>>> rock.jcn_similarity(change, brown_ic)
1e-300
>>> change.jcn_similarity(rock, brown_ic)
1e-300