按一列分组并计算Pandas中的多个类别



我有一个数据集df,我想按一列分组,然后在第二列中获取每个类别的计数

name    location    sku
svc1    ny          hey1
svc2    ny          hey1
svc3    ny          hey1
svc4    ny          hey1
lo1     ny          ok1
lo2     ny          ok1
fab1    ny          hi
fab2    ny          hi
fab3    ny          hi
hello   ca          no
hello   ca          no

需要

location    sku     count
ny          hey1    4
ny          ok1     2
ny          hi      3
ca          no      2


df2 = pd.DataFrame()
df2['sku'] = df.groupby('location')['sku'].nth(0)
df2['count'] = df.groupby('sku').count()

然而,我得到NAN计数,我没有得到sku下列出的所有数据。

欢迎提出任何建议。

您希望按两列分组:

df.groupby(['location','sku']).size().reset_index(name='count')

或按一列和value_counts分组:

# this should be slightly faster
(df.groupby('location')['sku'].value_counts()
.reset_index(name='count'))

输出:

location   sku  count
0       ca    no      2
1       ny  hey1      4
2       ny    hi      3
3       ny   ok1      2

最新更新