我有一个excel数据,格式如下:
Original Data Frame
Package FISCAL_YR SCENARIO PERIOD USD_AMT LY_USD_AMT CY_NetSales LY_NetSales
Canada 2021 Plan Per01 1.00 2.00 3.00 4.00
Africa 2021 Actual Per04 1.00 2.00 3.00 4.00
Africa 2021 Actual Per09 1.00 2.00 3.00 4.00
Brazil 2021 Plan Per11 1.00 2.00 3.00 4.00
Brazil 2021 Actual Per05 1.00 2.00 3.00 4.00
Africa 2021 Actual Per07 1.00 2.00 3.00 4.00
Mexico 2021 Plan Per10 1.00 2.00 3.00 4.00
Canada 2021 Actual Per02 1.00 2.00 3.00 4.00
为了简化我的计算,我试图将场景列中的值与最后4列中的数值进行适当分配:
Expected dataframe:
Actual Plan
Package Sum of USD_AMT Sum of LY_USD_AMT Sum of CY_NetSales Sum of LY_NetSales Sum of USD_AMT Sum of LY_USD_AMT Sum of CY_NetSales Sum of LY_NetSales
Africa 3 6 9 12
Brazil 2 4 6 8
Canada 1 2 3 4 1 2 3 4
Mexico 1 2 3 4
我正在Panda中尝试数据透视表选项,但它呈现了以下输出:
失败的解决方案:
pd_piv=pd.pivot_table(df_dummy,index=['Package', 'FISCAL_YR', 'PERIOD'],
columns=['SCENARIO'],
values=['USD_AMT', 'LY_USD_AMT', 'CY_NetSales', 'LY_NetSales'], aggfunc=np.sum, fill_value=0)
pd_piv.head()
CY_NetSales LY_NetSales LY_USD_AMT USD_AMT
SCENARIO Actual Plan WRKG_FCST Actual Plan WRKG_FCST Actual Plan WRKG_FCST Actual Plan WRKG_FCST
Package_SubCategory FISCAL_YR_NBR FISCAL_PERIOD_NBR
*数字没有显示,因为实际数据与完全不同
是否有任何方法可以获得上面显示的预期数据帧?
也许这就是您想要的:1-制作您的数据透视表:
import pandas as pd
import numpy as np
data={"package":["Canada","Africa","Africa","Brazil","Brazil","Africa","Mexico","Canada"],
"scenario":["Plan","Actual","Actual","Plan","Actual","Actual","Plan","Actual"],
"USD_AMT":[1,1,1,1,1,1,1,1,],
"LY_USD_AMT":[1,1,1,1,1,1,1,1,]}
df=pd.DataFrame(data)
pd_piv=pd.pivot_table(df,index=['package'],
columns=['scenario'],
values=['USD_AMT', 'LY_USD_AMT',], aggfunc=np.sum,fill_value=0)
结果:
LY_USD_AMT USD_AMT
scenario Actual Plan Actual Plan
package
Africa 3 0 3 0
Brazil 1 1 1 1
Canada 1 1 1 1
Mexico 0 1 0 1
2-交换索引级别:
pd_piv.columns=pd_piv.columns.swaplevel(0, 1)
pd_piv.sort_index(axis=1, level=0, inplace=True)
最终结果:
scenario Actual Plan
LY_USD_AMT USD_AMT LY_USD_AMT USD_AMT
package
Africa 3 3 0 0
Brazil 1 1 1 1
Canada 1 1 1 1
Mexico 0 0 1 1