我有一个数据集df,我想按一列分组,然后在第二列中获取每个类别的计数
name location sku
svc1 ny hey1
svc2 ny hey1
svc3 ny hey1
svc4 ny hey1
lo1 ny ok1
lo2 ny ok1
fab1 ny hi
fab2 ny hi
fab3 ny hi
hello ca no
hello ca no
需要
location sku count
ny hey1 4
ny ok1 2
ny hi 3
ca no 2
做df2 = pd.DataFrame()
df2['sku'] = df.groupby('location')['sku'].nth(0)
df2['count'] = df.groupby('sku').count()
然而,我得到NAN计数,我没有得到sku下列出的所有数据。
欢迎提出任何建议。
您希望按两列分组:
df.groupby(['location','sku']).size().reset_index(name='count')
或按一列和value_counts
分组:
# this should be slightly faster
(df.groupby('location')['sku'].value_counts()
.reset_index(name='count'))
输出:
location sku count
0 ca no 2
1 ny hey1 4
2 ny hi 3
3 ny ok1 2