我如何将两个不同的数据框架组合成一个数据透视表,然后在python中添加列来计算差异?



我有两个按月计算的保险索赔数据透视表,供应商和另一个来源,一个数据透视表有2个月的数据(假设这是"旧的");表),另一个有相同的2个月+ 1个月的数据(称之为"新";表)。我想做的是把它们并排放在一起,并添加两个额外的列,以月为单位减去新减去旧的差,然后计算变化的百分比。这两个表基于两个数据帧,df和hdf。

表是这样的(旧表,新表只包括202003):

PAID                                                   
SOURCE                       IP          OP           PR
CLAIM_VENDOR PYYMM                                                             
VENDOR 1     202001         NaN        0.00          NaN 
202002         NaN        0.00      3150.00
VENDOR 2     202001         NaN         NaN   1197812.00
202002         NaN         NaN   1837917.00
VENDOR 3     202001  9005941.61  8987012.39  16890979.44
202002  5158238.02  7796918.67  16888689.98
Old = pd.pivot_table(df, 
values = ["PAID"], 
index = ["CLAIM_VENDOR","PYYMM"], 
columns = ["SOURCE"],
aggfunc=np.sum
)
New = pd.pivot_table(hdf, 
values = ["PAID"], 
index = ["CLAIM_VENDOR","PYYMM"], 
columns = ["SOURCE"],
aggfunc=np.sum
)

所以最终的目标是让它们并排,然后添加6个额外的列,取每个"源"的差异。按月计算,然后是百分比差异。对于这个项目,即使在相同的月份,预计new也会有不同的值。

对不起,如果我做了不好的stackoverflow格式,以前没有使用过这个。

import pandas as pd
# Create old table
old_df_index = pd.MultiIndex(
[["VENDOR 1", "VENDOR 2", "VENDOR 3"], ["202001", "202002"]],
[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=["CLAIM_VENDOR", "PYYMM"]
)
old_df_index.name = "SOURCE"
old_df = pd.DataFrame(
{
"IP": [None, None, None, None, 9005941.61, 5158238.02],
"OP": [0, 0, None, None, 8987012.39, 7796918.67],
"PR": [None, 3150, 1197812, 1837917, 16890979.44, 16888689.98]
},
index=old_df_index,
columns=pd.Index(["IP", "OP", "PR"], name="PAID")
)
# Create new table
new_df_index = pd.MultiIndex.from_product(
[["VENDOR 1", "VENDOR 2", "VENDOR 3"], ["202001", '202002', "202003"]],
names=["CLAIM_VENDOR", "PYYMM"]
)
new_df_index.name = "SOURCE"
new_df = pd.DataFrame(
{
"IP": [None, None, None, None, None, 8675309*1.5, 9005941.61*1.7, 5158238.02*1.9, 2020567*2.1],
"OP": [0, 0, 0, None, None, None, 8987012.39*1.25, 7796918.67*1.45, 125964816*1.65],
"PR": [None, 3150*0.75, 2210*0.7, 1197812*0.9, 1837917*0.99, 1818956*0.6, 16890979.44*1.2, 16888689.98*1.1, 74747687*3.1]
},
index=new_df_index,
columns=pd.Index(["IP", "OP", "PR"], name="PAID")
)
# We want the index to include CLAIM_VENDOR, PYYMM and PAID for the join below
old_df_for_join = (
old_df.melt(ignore_index=False)
.reset_index()
.set_index(["CLAIM_VENDOR", "PYYMM", "PAID"])
)
new_df_for_join = (
new_df.melt(ignore_index=False)
.reset_index()
.set_index(["CLAIM_VENDOR", "PYYMM", "PAID"])
)
# Use .join to combine based on the index
final_table = (
new_df_for_join.join(
old_df_for_join,
lsuffix="new",
rsuffix="old"
)
.assign(
difference=lambda DF: DF.valuenew - DF.valueold,
pct_diff = lambda DF: DF.difference / DF.valueold
)
.unstack(level="PAID")
)
print(final_table)

结果

valuenew                                valueold                            difference                            pct_diff             
PAID                           IP            OP            PR          IP          OP           PR           IP            OP           PR       IP    OP    PR 
CLAIM_VENDOR PYYMM                                                                                                                                              
VENDOR 1     202001           NaN  0.000000e+00           NaN         NaN        0.00          NaN          NaN  0.000000e+00          NaN      NaN   NaN   NaN 
202002           NaN  0.000000e+00  2.362500e+03         NaN        0.00      3150.00          NaN  0.000000e+00     -787.500      NaN   NaN -0.25 
202003           NaN  0.000000e+00  1.547000e+03         NaN         NaN          NaN          NaN           NaN          NaN      NaN   NaN   NaN 
VENDOR 2     202001           NaN           NaN  1.078031e+06         NaN         NaN   1197812.00          NaN           NaN  -119781.200      NaN   NaN -0.10 
202002           NaN           NaN  1.819538e+06         NaN         NaN   1837917.00          NaN           NaN   -18379.170      NaN   NaN -0.01 
202003  1.301296e+07           NaN  1.091374e+06         NaN         NaN          NaN          NaN           NaN          NaN      NaN   NaN   NaN 
VENDOR 3     202001  1.531010e+07  1.123377e+07  2.026918e+07  9005941.61  8987012.39  16890979.44  6304159.127  2.246753e+06  3378195.888      0.7  0.25  0.20 
202002  9.800652e+06  1.130553e+07  1.857756e+07  5158238.02  7796918.67  16888689.98  4642414.218  3.508613e+06  1688868.998      0.9  0.45  0.10 
202003  4.243191e+06  2.078419e+08  2.317178e+08         NaN         NaN          NaN          NaN           NaN          NaN      NaN   NaN   NaN 

相关内容

  • 没有找到相关文章

最新更新