统计SQL或Pandas中GROUP BY之后的类别数

我有一个数据帧df:

ORDERID    PRODUCTTYPE  PRODUCTID  PRODUCT
123         Fruits         2       Banana
123         Vegetables     3       Tomato
123         Vegetables     3       Onion
321         Fruits         2       Grapes
321         Fruits         2       Avocado

我需要作为输出

ORDERID  FRUITS  VEGETABLES
123       1          2
321       2          0

我需要对groupby进行任何修改吗？我在做

df.groupby('ORDERID'['PRODUCTTYPE'].nunique().reset_index(name="count")

但这只是计算类别的数量

groupby().value_counts()和unstack():的组合

(df.groupby('ORDERID')['PRODUCTTYPE']
.value_counts()
.unstack('PRODUCTTYPE', fill_value=0)
)

或pivot_table:

df.pivot_table(index='ORDERID', 
columns='PRODUCTTYPE', 
values='PRODUCTID', 
aggfunc='count',
fill_value=0)

输出：

PRODUCTTYPE  Fruits  Vegetables
ORDERID                        
123               1           2
321               2           0

在SQL中，可以使用条件聚合：

select
orderid,
sum(case when producttype = 'Fruits' then 1 else 0 end) fruits,
sum(case when producttype = 'Vegetables' then 1 else 0 end) vegetables
from mytable
group by orderid

或者，如果您的数据库支持现代filter子句来聚合函数：

select
orderid,
count(*) filter(where producttype = 'Fruits') fruits,
count(*) filter(where producttype = 'Vegetables') vegetables
from mytable
group by orderid

Pandapd.crosstab中的一个函数

pd.crosstab(df.ORDERID,df.PRODUCTTYPE)
PRODUCTTYPE  Fruits  Vegetables
ORDERID                        
123               1           2
321               2           0

select 
ORDERID,
count(Fruits) over (partition by ORDERID) as Fruits,
count(Vegetables) over (partition by ORDERID) as Vegetables
from 
table
group by ORDERID;

相关内容

最新更新

热门标签：