我需要根据几个公式创建一个包含一列的df_2。我可以用group和merge创建多个df。但是,是否有一种有效的方法来实现这一目标呢?
给定df_1如下,
df_1 = pd.DataFrame([['A', '1/1/2021','SKU_1','Customer Backhaul','34,848','$-51,100'],
['A', '1/1/2021','SKU_1','FOB','75,357','$12,407,112'],
['A', '1/1/2021','SKU_1','Price','75,357','$12,407,112'],
['A', '1/1/2021','SKU_1','Vendor Freight - Delivered','40,511','$65,470'],
['B', '1/1/2021','SKU_1','Customer Backhaul','197,904','$-157,487'],
['B', '1/1/2021','SKU_1','FOB','931,866','$50,059,515'],
['B', '1/1/2021','SKU_1','Price','931,866','$62,333,500'],
['B', '1/1/2021','SKU_1','Vendor Freight - Delivered','740,355','$1,220,927']],
columns=['Group', 'Month','ID','Cost Type','Volume','Order Cost'])
df_2中列的公式如下:
- 运费=(客户回程)+供应商交货的绝对值
- fob = fob
- 价格=价格-客户回程
- 体积=离岸价体积
df_2是预期结果,
Out[df_2]
Group Month ID Cost Type Volume Cost
0 A 1/1/2021 SKU_1 Freight 75,357 $116,570
1 A 1/1/2021 SKU_1 FOB 75,357 $12,407,112
2 A 1/1/2021 SKU_1 Price 75,357 $12,434,063
3 B 1/1/2021 SKU_1 Freight 931,866 $1,378,414
4 B 1/1/2021 SKU_1 FOB 931,866 $50,059,515
5 B 1/1/2021 SKU_1 Price 931,866 $62,490,988
如果我们要做加减法,我们首先需要将数据转换为数值数据:
>>> df_1[['Volume', 'Order Cost']] = df_1[['Volume', 'Order Cost']].transform(
... lambda s: pd.to_numeric(s.str.replace('^$|,', '', regex=True))
... )
>>> df_1
Group Month ID Cost Type Volume Order Cost
0 A 1/1/2021 SKU_1 Customer Backhaul 34848 -51100
1 A 1/1/2021 SKU_1 FOB 75357 12407112
2 A 1/1/2021 SKU_1 Price 75357 12407112
3 A 1/1/2021 SKU_1 Vendor Freight - Delivered 40511 65470
4 B 1/1/2021 SKU_1 Customer Backhaul 197904 -157487
5 B 1/1/2021 SKU_1 FOB 931866 50059515
6 B 1/1/2021 SKU_1 Price 931866 62333500
7 B 1/1/2021 SKU_1 Vendor Freight - Delivered 740355 1220927
然后我发现最优雅的方法来解决你的问题将是pivot你的数据框架,每个订单有一行,成本类型在列:
>>> cost = df_1.pivot(['Group', 'Month', 'ID'], 'Cost Type', 'Order Cost')
>>> cost
Cost Type Customer Backhaul FOB Price Vendor Freight - Delivered
Group Month ID
A 1/1/2021 SKU_1 -51100 12407112 12407112 65470
B 1/1/2021 SKU_1 -157487 50059515 62333500 1220927
然后你可以简单地写出你想要的操作:
>>> cost['Customer Backhaul'] + cost['Vendor Freight - Delivered']
Group Month ID
A 1/1/2021 SKU_1 14370
B 1/1/2021 SKU_1 1063440
dtype: int64
因此,我们可以将您想要的df_2
公式以简洁易读的方式组合在一起,然后取消pivot以获得您想要的.unstack()
格式:
>>> df_2 = pd.concat({
... 'Freight': cost['Customer Backhaul'].abs() + cost['Vendor Freight - Delivered'],
... 'FOB': cost['FOB'],
... 'Price': cost['Price'] - cost['Customer Backhaul'],
... }, axis='columns', names=['Order Type']).stack('Order Type').to_frame('Cost').reset_index()
>>> df_2
Group Month ID Order Type Cost
0 A 1/1/2021 SKU_1 Freight 116570
1 A 1/1/2021 SKU_1 FOB 12407112
2 A 1/1/2021 SKU_1 Price 12458212
3 B 1/1/2021 SKU_1 Freight 1378414
4 B 1/1/2021 SKU_1 FOB 50059515
5 B 1/1/2021 SKU_1 Price 62490987
选择FOB
值后,可以使用合并获得Volume:
>>> fob_volume = df_1.loc[df_1['Cost Type'].eq('FOB'), ['Group', 'Month', 'ID', 'Volume']]
>>> fob_volume
Group Month ID Volume
1 A 1/1/2021 SKU_1 75357
5 B 1/1/2021 SKU_1 931866
>>> df_2 = df_2.merge(fob_volume)
>>> df_2
Group Month ID Order Type Cost Volume
0 A 1/1/2021 SKU_1 Freight 116570 75357
1 A 1/1/2021 SKU_1 FOB 12407112 75357
2 A 1/1/2021 SKU_1 Price 12458212 75357
3 B 1/1/2021 SKU_1 Freight 1378414 931866
4 B 1/1/2021 SKU_1 FOB 50059515 931866
5 B 1/1/2021 SKU_1 Price 62490987 931866
最后,如果你想重新应用样式,可以使用.apply()
:
>>> df_2['Cost'] = df_2['Cost'].apply('${:,.0f}'.format)
>>> df_2['Volume'] = df_2['Volume'].apply('{:,.0f}'.format)
>>> df_2
Group Month ID Order Type Cost Volume
0 A 1/1/2021 SKU_1 Freight $116,570 75,357
1 A 1/1/2021 SKU_1 FOB $12,407,112 75,357
2 A 1/1/2021 SKU_1 Price $12,458,212 75,357
3 B 1/1/2021 SKU_1 Freight $1,378,414 931,866
4 B 1/1/2021 SKU_1 FOB $50,059,515 931,866
5 B 1/1/2021 SKU_1 Price $62,490,987 931,866