WordNet的单词感觉歧义.如何选择与相同含义有关的单词



我正在使用wordnet和nltk进行歧义。我对所有与声音有关的单词感兴趣。我有这样的单词列表,"滚动"就是其中之一。然后,我检查我的任何句子是否包含此词(我还根据POS进行检查(。如果是,我只想选择与声音有关的句子。在下面的示例中,这将是第二句话。我现在的想法只是选择这样的单词,它的定义中有一个单词"声音"为"鼓声(尤其是编鼓的声音(迅速且连续地击败"。但是我怀疑有一种更优雅的方式。任何想法都将不胜感激!

from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]
word = 'roll'
for sentence, pos_tag in samples:
    word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
    print 'Sentence:', sentence
    print 'Word synset:', word_syn
    print  'Corresponding definition:', word_syn.definition()

输出:

Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously

您可以使用WordNet HyperNyms(具有更一般含义的同步组(。我的第一个想法是从当前的合成量向上(使用synset.hypernyms()(,并继续检查我是否找到"声音"同步。当我碰到根部(没有超核的根,即synset.hypernyms()返回一个空列表(,我会停下来。

现在,对于您的两个示例,这会产生以下综合序列:

Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]

因此,您可能想寻找的合成器之一是sound.n.04。但是可能还有其他人,我认为您可以使用其他示例并尝试提出列表。

最新更新