Python:基于group的sum,并将其显示为附加列



假设我们有一个像下面这样的数据框架:

channel     store          units
Offline     Bournemouth    62
Offline     Kettering      90
Offline     Manchester     145
Online      Bournemouth    220
Online      Kettering      212
Online      Manchester     272

我的目的是再添加两个列,其中包含每个渠道销售的全部数量以及每个商店在每个渠道中所占的份额。简而言之,我希望达到的输出应该如下所示:

channel     store          units   units_per_channel  store_share
Offline     Bournemouth    62      297                0.21
Offline     Kettering      90      297                0.30
Offline     Manchester     145     297                0.49
Online      Bournemouth    220     704                0.31
Online      Kettering      212     704                0.30
Online      Manchester     272     704                0.39

有什么简单而优雅的方法来得到这个吗?

channel上做一个.grouby(),得到units的和。然后简单地将units除以units_per_channel

import pandas as pd

df = pd.DataFrame([['Offline',    'Bournemouth',    62],
['Offline' ,    'Kettering'  ,    90],
['Offline' ,    'Manchester' ,    145],
['Online'  ,    'Bournemouth',    220],
['Online'  ,    'Kettering',      212],
['Online'  ,    'Manchester',     272]],
columns=['channel','store','units'],)

df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']

输出:

print(df)
channel        store  units  units_per_channel  store_share
0  Offline  Bournemouth     62                297     0.208754
1  Offline    Kettering     90                297     0.303030
2  Offline   Manchester    145                297     0.488215
3   Online  Bournemouth    220                704     0.312500
4   Online    Kettering    212                704     0.301136
5   Online   Manchester    272                704     0.386364

最新更新