我有一个包含列的数据框架:languages and words
df:
Parts of speech word
0 Noun cat
1 Noun water
2 Noun cat
3 verb draw
4 verb draw
5 adj slow
我想按词性(我所期望的)对最热门的单词进行分组:
Parts of speech top
Noun {'cat':2,'water':1}
verb {'draw':2}
adj {'slow':1}
我使用方法groupby和apply,但是我没有得到我需要的
df2=df.groupby('Parts of speech')['word'].apply(lambda x : x.value_counts())
如何为每个词性创建一个元组?
一种方法是使用.agg
+collections.Counter
进行聚合:
from collections import Counter
df2=df.groupby('Parts of speech')['word'].agg(Counter)
print(df2)
Parts of speech
Noun {'cat': 2, 'water': 1}
adj {'slow': 1}
verb {'draw': 2}
Name: word, dtype: object
使用value_counts
的替代方法(注意末尾的to_dict调用):
df2 = df.groupby('Parts of speech')['word'].agg(lambda x: x.value_counts().to_dict())