在大熊猫的两个级别上匹配的三级多索引中查找行之间的月环比差异



我需要你的帮助。我一整天都盯着这个。如果最后4个数字列在级别1(bucket(和级别2(Salary(上匹配,但级别0是我需要评估的日期,我正试图找出它们在本月底和上个月底之间的差异。

Price   Cost    Value   Gross
Date        Bucket  Salary              
2021-10-31  30DPD   2 Missed    3.33    3.07    92,508.65   153,493.06
3 Missed    1.91    2.69    25,961.60   71,329.51
4-5 Missed  0.68    0.84    13,819.31   28,759.36
6-12 Missed 0.29    1.00    405.33      6,909.69

2021-11-30  30DPD   1 Missed    5.42    3.86    36,345.48   97,378.17
2 Missed    55.75   46.11   473,204.30  1,150,866.42
3 Missed    38.50   33.43   452,324.33  998,917.02
4-5 Missed  46.67   38.26   529,012.51  1,327,705.85
6-12 Missed 59.56   59.29   705,956.78  1,840,849.20

不知道如何使用multiIndexes的全部功能来访问数据和添加逻辑筛选器。

我尝试了以下方法,但没有成功。

summary['Price_diff'] = np.nan
for idx in summary.index.levels[0]:
summary.Price_diff[idx] = summary.Price[idx].diff()

我该如何做到这一点?

您的问题不清楚,但您正在寻找类似的东西吗?

>>> df.groupby(['Bucket', 'Salary']).diff()
Price   Cost      Value       Gross
Date       Bucket Salary                                          
2021-10-31 30DPD  2 Missed       NaN    NaN        NaN         NaN
3 Missed       NaN    NaN        NaN         NaN
4-5 Missed     NaN    NaN        NaN         NaN
6-12 Missed    NaN    NaN        NaN         NaN
2021-11-30 30DPD  1 Missed       NaN    NaN        NaN         NaN
2 Missed     52.42  43.04  380695.65   997373.36
3 Missed     36.59  30.74  426362.73   927587.51
4-5 Missed   45.99  37.42  515193.20  1298946.49
6-12 Missed  59.27  58.29  705551.45  1833939.51

设置可再现的MRE:

import pandas as pd
from pandas import Timestamp
data = {'Date': [Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00')], 
'Bucket': ['30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD'],
'Salary': ['2 Missed', '3 Missed', '4-5 Missed', '6-12 Missed', '1 Missed', '2 Missed', '3 Missed', '4-5 Missed', '6-12 Missed'], 
'Price': [3.33, 1.91, 0.68, 0.29, 5.42, 55.75, 38.5, 46.67, 59.56],
'Cost': [3.07, 2.69, 0.84, 1.0, 3.86, 46.11, 33.43, 38.26, 59.29],
'Value': [92508.65, 25961.6, 13819.31, 405.33, 36345.48, 473204.3, 452324.33, 529012.51, 705956.78],
'Gross': [153493.06, 71329.51, 28759.36, 6909.69, 97378.17, 1150866.42, 998917.02, 1327705.85, 1840849.2]}
df = pd.DataFrame(data).set_index(['Date', 'Bucket', 'Salary'])

最新更新