假设我们有一个像下面这样的数据框架:
channel store units
Offline Bournemouth 62
Offline Kettering 90
Offline Manchester 145
Online Bournemouth 220
Online Kettering 212
Online Manchester 272
我的目的是再添加两个列,其中包含每个渠道销售的全部数量以及每个商店在每个渠道中所占的份额。简而言之,我希望达到的输出应该如下所示:
channel store units units_per_channel store_share
Offline Bournemouth 62 297 0.21
Offline Kettering 90 297 0.30
Offline Manchester 145 297 0.49
Online Bournemouth 220 704 0.31
Online Kettering 212 704 0.30
Online Manchester 272 704 0.39
有什么简单而优雅的方法来得到这个吗?
在channel
上做一个.grouby()
,得到units
的和。然后简单地将units
除以units_per_channel
import pandas as pd
df = pd.DataFrame([['Offline', 'Bournemouth', 62],
['Offline' , 'Kettering' , 90],
['Offline' , 'Manchester' , 145],
['Online' , 'Bournemouth', 220],
['Online' , 'Kettering', 212],
['Online' , 'Manchester', 272]],
columns=['channel','store','units'],)
df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']
输出:
print(df)
channel store units units_per_channel store_share
0 Offline Bournemouth 62 297 0.208754
1 Offline Kettering 90 297 0.303030
2 Offline Manchester 145 297 0.488215
3 Online Bournemouth 220 704 0.312500
4 Online Kettering 212 704 0.301136
5 Online Manchester 272 704 0.386364