如何在python/nltk中对bigram的频率进行排序

我是python和nltk的新手，我想在文本(字符串(中找到bigram的频率，然后从最高频率到最低频率对bigram进行排序。我发现了使用的bigram和频率

tokens = nltk.word_tokenize(text)
bgs = nltk.bigrams(tokens
fdist = nltk.FreqDist(bgs)

但我不知道如何从最高频率到最低频率进行排序？

我知道这可能很容易，但我想不通。希望有人能帮助我！

您可以尝试使用两个不同的列表来保留bigram-单词及其值，并且可以使用这些列表进行排序。我分享了一个链接，希望它能对你的问题有用。

一个可以生成二元文本的示例程序

bigrams = nltk.bigrams(tokens)    
bigrams_freq = nltk.FreqDist(bigrams)       
words_bigrams = []
values_bigrams = []

for items in bigrams_freq.items() :
words_bigrams.append(items[0])
values_bigrams.append(items[1])

def sort_them(w,v):
values = []
words = []
##add all words values
for i in v :
values.append(i)

##sort them the biggest -> smallest
values.sort(reverse=True)

##add to an array these values words
for i in values :
words.append(w[i])
sort(words_bigrams,valus_bigrams)

相关内容

最新更新

热门标签：