从不同的数据框架进行Python列计算



我需要根据几个公式创建一个包含一列的df_2。我可以用group和merge创建多个df。但是,是否有一种有效的方法来实现这一目标呢?

给定df_1如下,

df_1 = pd.DataFrame([['A', '1/1/2021','SKU_1','Customer Backhaul','34,848','$-51,100'],
['A', '1/1/2021','SKU_1','FOB','75,357','$12,407,112'],
['A', '1/1/2021','SKU_1','Price','75,357','$12,407,112'],
['A', '1/1/2021','SKU_1','Vendor Freight - Delivered','40,511','$65,470'],
['B', '1/1/2021','SKU_1','Customer Backhaul','197,904','$-157,487'],
['B', '1/1/2021','SKU_1','FOB','931,866','$50,059,515'],
['B', '1/1/2021','SKU_1','Price','931,866','$62,333,500'],
['B', '1/1/2021','SKU_1','Vendor Freight - Delivered','740,355','$1,220,927']], 
columns=['Group', 'Month','ID','Cost Type','Volume','Order Cost'])

df_2中列的公式如下:

  • 运费=(客户回程)+供应商交货的绝对值
  • fob = fob
  • 价格=价格-客户回程
  • 体积=离岸价体积

df_2是预期结果,

Out[df_2]
Group   Month     ID      Cost Type   Volume    Cost
0   A   1/1/2021    SKU_1   Freight     75,357  $116,570
1   A   1/1/2021    SKU_1   FOB         75,357  $12,407,112
2   A   1/1/2021    SKU_1   Price       75,357  $12,434,063
3   B   1/1/2021    SKU_1   Freight     931,866 $1,378,414
4   B   1/1/2021    SKU_1   FOB         931,866 $50,059,515
5   B   1/1/2021    SKU_1   Price       931,866 $62,490,988

如果我们要做加减法,我们首先需要将数据转换为数值数据:

>>> df_1[['Volume', 'Order Cost']] = df_1[['Volume', 'Order Cost']].transform(
...      lambda s: pd.to_numeric(s.str.replace('^$|,', '', regex=True))
... )
>>> df_1
Group     Month     ID                   Cost Type  Volume  Order Cost
0     A  1/1/2021  SKU_1           Customer Backhaul   34848      -51100
1     A  1/1/2021  SKU_1                         FOB   75357    12407112
2     A  1/1/2021  SKU_1                       Price   75357    12407112
3     A  1/1/2021  SKU_1  Vendor Freight - Delivered   40511       65470
4     B  1/1/2021  SKU_1           Customer Backhaul  197904     -157487
5     B  1/1/2021  SKU_1                         FOB  931866    50059515
6     B  1/1/2021  SKU_1                       Price  931866    62333500
7     B  1/1/2021  SKU_1  Vendor Freight - Delivered  740355     1220927

然后我发现最优雅的方法来解决你的问题将是pivot你的数据框架,每个订单有一行,成本类型在列:

>>> cost = df_1.pivot(['Group', 'Month', 'ID'], 'Cost Type', 'Order Cost')
>>> cost
Cost Type             Customer Backhaul       FOB     Price  Vendor Freight - Delivered
Group Month    ID                                                                      
A     1/1/2021 SKU_1             -51100  12407112  12407112                       65470
B     1/1/2021 SKU_1            -157487  50059515  62333500                     1220927

然后你可以简单地写出你想要的操作:

>>> cost['Customer Backhaul'] + cost['Vendor Freight - Delivered']
Group  Month     ID   
A      1/1/2021  SKU_1      14370
B      1/1/2021  SKU_1    1063440
dtype: int64

因此,我们可以将您想要的df_2公式以简洁易读的方式组合在一起,然后取消pivot以获得您想要的.unstack()格式:

>>> df_2 = pd.concat({
...     'Freight': cost['Customer Backhaul'].abs() + cost['Vendor Freight - Delivered'],
...     'FOB': cost['FOB'],
...     'Price': cost['Price'] - cost['Customer Backhaul'],
... }, axis='columns', names=['Order Type']).stack('Order Type').to_frame('Cost').reset_index()
>>> df_2
Group     Month     ID Order Type      Cost
0     A  1/1/2021  SKU_1    Freight    116570
1     A  1/1/2021  SKU_1        FOB  12407112
2     A  1/1/2021  SKU_1      Price  12458212
3     B  1/1/2021  SKU_1    Freight   1378414
4     B  1/1/2021  SKU_1        FOB  50059515
5     B  1/1/2021  SKU_1      Price  62490987

选择FOB值后,可以使用合并获得Volume:

>>> fob_volume = df_1.loc[df_1['Cost Type'].eq('FOB'), ['Group', 'Month', 'ID', 'Volume']]
>>> fob_volume
Group     Month     ID  Volume
1     A  1/1/2021  SKU_1   75357
5     B  1/1/2021  SKU_1  931866
>>> df_2 = df_2.merge(fob_volume)
>>> df_2
Group     Month     ID Order Type      Cost  Volume
0     A  1/1/2021  SKU_1    Freight    116570   75357
1     A  1/1/2021  SKU_1        FOB  12407112   75357
2     A  1/1/2021  SKU_1      Price  12458212   75357
3     B  1/1/2021  SKU_1    Freight   1378414  931866
4     B  1/1/2021  SKU_1        FOB  50059515  931866
5     B  1/1/2021  SKU_1      Price  62490987  931866

最后,如果你想重新应用样式,可以使用.apply():

>>> df_2['Cost'] = df_2['Cost'].apply('${:,.0f}'.format)
>>> df_2['Volume'] = df_2['Volume'].apply('{:,.0f}'.format)
>>> df_2
Group     Month     ID Order Type         Cost   Volume
0     A  1/1/2021  SKU_1    Freight     $116,570   75,357
1     A  1/1/2021  SKU_1        FOB  $12,407,112   75,357
2     A  1/1/2021  SKU_1      Price  $12,458,212   75,357
3     B  1/1/2021  SKU_1    Freight   $1,378,414  931,866
4     B  1/1/2021  SKU_1        FOB  $50,059,515  931,866
5     B  1/1/2021  SKU_1      Price  $62,490,987  931,866