我有一个数据帧df:
ORDERID PRODUCTTYPE PRODUCTID PRODUCT
123 Fruits 2 Banana
123 Vegetables 3 Tomato
123 Vegetables 3 Onion
321 Fruits 2 Grapes
321 Fruits 2 Avocado
我需要作为输出
ORDERID FRUITS VEGETABLES
123 1 2
321 2 0
我需要对groupby进行任何修改吗?我在做
df.groupby('ORDERID'['PRODUCTTYPE'].nunique().reset_index(name="count")
但这只是计算类别的数量
groupby().value_counts()
和unstack()
:的组合
(df.groupby('ORDERID')['PRODUCTTYPE']
.value_counts()
.unstack('PRODUCTTYPE', fill_value=0)
)
或pivot_table
:
df.pivot_table(index='ORDERID',
columns='PRODUCTTYPE',
values='PRODUCTID',
aggfunc='count',
fill_value=0)
输出:
PRODUCTTYPE Fruits Vegetables
ORDERID
123 1 2
321 2 0
在SQL中,可以使用条件聚合:
select
orderid,
sum(case when producttype = 'Fruits' then 1 else 0 end) fruits,
sum(case when producttype = 'Vegetables' then 1 else 0 end) vegetables
from mytable
group by orderid
或者,如果您的数据库支持现代filter
子句来聚合函数:
select
orderid,
count(*) filter(where producttype = 'Fruits') fruits,
count(*) filter(where producttype = 'Vegetables') vegetables
from mytable
group by orderid
Pandapd.crosstab
中的一个函数
pd.crosstab(df.ORDERID,df.PRODUCTTYPE)
PRODUCTTYPE Fruits Vegetables
ORDERID
123 1 2
321 2 0
select
ORDERID,
count(Fruits) over (partition by ORDERID) as Fruits,
count(Vegetables) over (partition by ORDERID) as Vegetables
from
table
group by ORDERID;