我需要你的帮助。我一整天都盯着这个。如果最后4个数字列在级别1(bucket(和级别2(Salary(上匹配,但级别0是我需要评估的日期,我正试图找出它们在本月底和上个月底之间的差异。
Price Cost Value Gross
Date Bucket Salary
2021-10-31 30DPD 2 Missed 3.33 3.07 92,508.65 153,493.06
3 Missed 1.91 2.69 25,961.60 71,329.51
4-5 Missed 0.68 0.84 13,819.31 28,759.36
6-12 Missed 0.29 1.00 405.33 6,909.69
2021-11-30 30DPD 1 Missed 5.42 3.86 36,345.48 97,378.17
2 Missed 55.75 46.11 473,204.30 1,150,866.42
3 Missed 38.50 33.43 452,324.33 998,917.02
4-5 Missed 46.67 38.26 529,012.51 1,327,705.85
6-12 Missed 59.56 59.29 705,956.78 1,840,849.20
不知道如何使用multiIndexes的全部功能来访问数据和添加逻辑筛选器。
我尝试了以下方法,但没有成功。
summary['Price_diff'] = np.nan
for idx in summary.index.levels[0]:
summary.Price_diff[idx] = summary.Price[idx].diff()
我该如何做到这一点?
您的问题不清楚,但您正在寻找类似的东西吗?
>>> df.groupby(['Bucket', 'Salary']).diff()
Price Cost Value Gross
Date Bucket Salary
2021-10-31 30DPD 2 Missed NaN NaN NaN NaN
3 Missed NaN NaN NaN NaN
4-5 Missed NaN NaN NaN NaN
6-12 Missed NaN NaN NaN NaN
2021-11-30 30DPD 1 Missed NaN NaN NaN NaN
2 Missed 52.42 43.04 380695.65 997373.36
3 Missed 36.59 30.74 426362.73 927587.51
4-5 Missed 45.99 37.42 515193.20 1298946.49
6-12 Missed 59.27 58.29 705551.45 1833939.51
设置可再现的MRE:
import pandas as pd
from pandas import Timestamp
data = {'Date': [Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-10-31 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00'), Timestamp('2021-11-30 00:00:00')],
'Bucket': ['30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD', '30DPD'],
'Salary': ['2 Missed', '3 Missed', '4-5 Missed', '6-12 Missed', '1 Missed', '2 Missed', '3 Missed', '4-5 Missed', '6-12 Missed'],
'Price': [3.33, 1.91, 0.68, 0.29, 5.42, 55.75, 38.5, 46.67, 59.56],
'Cost': [3.07, 2.69, 0.84, 1.0, 3.86, 46.11, 33.43, 38.26, 59.29],
'Value': [92508.65, 25961.6, 13819.31, 405.33, 36345.48, 473204.3, 452324.33, 529012.51, 705956.78],
'Gross': [153493.06, 71329.51, 28759.36, 6909.69, 97378.17, 1150866.42, 998917.02, 1327705.85, 1840849.2]}
df = pd.DataFrame(data).set_index(['Date', 'Bucket', 'Salary'])