我有两个数据框架。第一个数据框用于按地点和月份计算托运单的总成本。第二个是按地点和月份计算总成本(维护+燃料)。
我的代码如下:def test():
start_date = '2022-01-01'
end_date = '2022-05-31'
###### First dataframe
# Get monthly delivery log cost
total_delivery_log_cost = monthly_delivery_log_cost_by_branch(start_date, end_date)
# Get monthly pickup cost
total_pickup_log_cost = monthly_pickup_log_cost_by_branch(start_date, end_date)
# Union total monthly cost (CN)
df = pd.concat([total_delivery_log_cost, total_pickup_log_cost])
# Pivot
df = df.pivot_table(index=['location'],columns =['report_month'], aggfunc = np.sum, fill_value=0)
df.columns = df.columns.to_flat_index().to_series().apply(lambda x: x[1])
df = df.reset_index()
df.to_csv('total_CN.csv')
###### Second dataframe
# Get monthly vehicle cost:
total_maintainence_cost = monthly_maintainence_cost_by_branch(start_date, end_date)
print(total_maintainence_cost)
# Get monthly refuel cost:
total_refuel_cost = monthly_refuel_cost_by_branch(start_date, end_date)
print(total_refuel_cost)
# Union total monthly maintainence cost
df = pd.concat([total_maintainence_cost, total_refuel_cost])
# Pivot
df = df.pivot_table(index=['location'],columns =['report_month'], aggfunc = np.sum, fill_value=0)
df.columns = df.columns.to_flat_index().to_series().apply(lambda x: x[1])
df = df.reset_index()
df.to_csv('total_cost.csv')
print(df)
print("Success")
我的第一个数据帧如下所示(总CN):
location 2022-01 2022-02 2022-03 2022-04 2022-05
ABC 22.00 24.00 60.20 55.30 66.43
XYZ 50.00 40.33 14.50 50.60 90.40
XXX 10.00 21.20 22.40 23.40 22.11
... ... ... .... .... .....
第二个数据框如下所示(Total Cost):
location 2022-01 2022-02 2022-03 2022-04 2022-05
ABC 30.00 33.00 5.20 65.30 12.43
XYZ 67.00 21.33 5.50 21.60 42.40
QWE 10.00 34.20 53.40 34.40 22.11
... ... ... .... .... .....
当我有这两个数据帧时,我想把它们组合成多索引,这是我想要实现的,所以最终的DF看起来像:
2022-01 2022-02 .......
location Total CN Total Cost Total CN Total Cost .......
ABC 22.00 30.00 24.00 33.00
XYZ 50.00 67.00 40.33 21.33
XXX 10.00 0.00 21.20 0.00
QWE 0.00 10.00 0.00 34.20
.... .... .... .... .....
我如何做到这一点?
您可以将数据框与键连接起来,以创建具有多索引的新数据框,并交换索引级别:
df_total_cn = pd.read_csv("total_cn.csv", index_col='location')
df_total_cost = pd.read_csv("total_cost.csv", index_col='location')
df = pd.concat([df_total_cn, df_total_cost], axis=1, keys=['total cn', 'total cost']).swaplevel(axis=1).sort_index('columns').fillna(0)
2022-01 2022-02 2022-03 2022-04 2022-05
total cn total cost total cn total cost total cn total cost total cn total cost total cn total cost
location
ABC 22.0 30.0 24.00 33.00 60.2 5.2 55.3 65.3 66.43 12.43
XYZ 50.0 67.0 40.33 21.33 14.5 5.5 50.6 21.6 90.40 42.40
XXX 10.0 0.0 21.20 0.00 22.4 0.0 23.4 0.0 22.11 0.00
QWE 0.0 10.0 0.00 34.20 0.0 53.4 0.0 34.4 0.00 22.11
当总列缺少时
df = pd.concat([df_total_cn, df_total_cost], axis=1, keys=['total cn', 'total cost']).swaplevel(axis=1)
for i in ['2022-01', '2022-02', '2022-03', '2022-04', '2022-05', ]:
for j in ['total cn', 'total cost']:
if j not in df[i].columns:
df.loc[:, (i, j)] = 0
df = df.sort_index('columns').fillna(0)
输出:
2022-01 2022-02 2022-03 2022-04 2022-05
total cn total cost total cn total cost total cn total cost total cn total cost total cn total cost
location
ABC 22.0 0 24.00 33.00 60.2 5.2 55.3 65.3 66.43 12.43
XYZ 50.0 0 40.33 21.33 14.5 5.5 50.6 21.6 90.40 42.40
XXX 10.0 0 21.20 0.00 22.4 0.0 23.4 0.0 22.11 0.00
QWE 0.0 0 0.00 34.20 0.0 53.4 0.0 34.4 0.00 22.11