我看到很多问题和答案说,NLTK的搭配不能超越二元和三元。
举个例子——如何在python nltk中获得n-gram的搭配和关联?
我看到有一种叫做的东西
nltk.QuadgramColocationFinder
类似于
nltk.BigramColocationFinder和nltk。TrigramColocationFinder
但同时看不到任何类似的东西
nltk.搭配.QuadgramAssocMeasures()
类似于nltk.collocations.BigramAssocMeasures()和nltk.coolocations.TrigramAssocMeasures
nltk的目的是什么。QuadgramColocationFinder如果不可能(没有技巧)找到二进制和三进制之外的n进制。
也许我错过了什么。
谢谢,
根据Alvas的输入添加代码并更新问题,这现在可以使用
import nltk
from nltk.collocations import *
from nltk.corpus import PlaintextCorpusReader
from nltk.metrics.association import QuadgramAssocMeasures
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
quadgram_measures = QuadgramAssocMeasures()
the_filter = lambda *w: 'crazy' not in w
finder = BigramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print (finder.nbest(bigram_measures.likelihood_ratio, 10))
finder = QuadgramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print(finder.nbest(quadgram_measures.likelihood_ratio,10))
来自回购:
from nltk.metrics.association import QuadgramAssocMeasures