我为每个ID都有一个这样的数据集;
月数 | ID | 工资||
---|---|---|---|
2020-12-01 | 1 | 5240.0 | |
2020-11-01 | 1 | 4532.0 | |
2020-10-01 | 1 | 4532.0 | |
2020-09-01 | 1 | 4532.0 | |
2020-08-01 | 1 | 4532.0 | |
2020-07-01 | 1 | 4532.0 | |
2020-06-01 | 1 | 4532.0 | |
2020-05-01 | 1 | 4532.0 | |
2020-04-01 | 1 | 4532.0 | |
2020-03-01 | 1 | 4532.0 | |
2020-02-01 | 1 | 3808.0 | |
2020-01-01 | 1 | 3808.0 | |
2019-12-01 | 1 | 3808.0 | |
2019-11-01 | 1 | 3808.0 | |
2019-10-01 | 1 | 3808.0 | |
2019-09-01 | 1 | 3808.0 | |
2019-08-01 | |||
2019-07-01 | 1 | 3808.0 | |
2019-06-01 | 1 | 3808.0 | |
2019-05-01 | 1 | 3808.0 | |
2019-04-01 | 1 | 3096.0 | |
2019-03-01 | 1 | 3096.0 | |
2019-02-01 | 1 | 3096.0 | |
2019-01-01 | 1 | 3096.0 | |
2018-12-01 | 1 | 3096.0 | |
2018-11-01 | 1 | 3096.0 | |
2018-10-01 | |||
2018-09-01 | 1 | 2669.0 | |
2018-08-01 | 1 | 2669.0 | |
2018-07-01 | 1 | 2669.0 | |
2020-12-01 | 2 | 1632.0 | |
2020年11月1日 | 2 | 1632.0 | |
2020-10-01 | 2 | 1632.0 | |
2020-09-01 | 2 | 1600.0 |
可能有更好的解决方案,但目前我使用了这样的方法。
#Find the percentage changes, but it remains Nan for the last months.
unique_all['SalaryDifferencewithLastMonth%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-1))
unique_all['SalaryDifferenceinLast6Months%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-6))
unique_all['SalaryDifferenceinLast12Months%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-12))
#Find the salary of the latest month(at the bottom) for each id
latest_salary = pd.DataFrame(unique_all.loc[unique_all.groupby('ID').Months.idxmin()][['ID','Salary']].reset_index(drop=True))
latest_salary = latest_salary.rename(columns={'Salary':'latest_salary'})
unique_all= unique_all.merge(latest_salary,on='ID',how='left')
#Find the percentage change with the latest month of the id and assign to the Nan values
unique_all['salary_diff2'] = ((unique_all['Salary'] - unique_all['latest_salary']) * 100 / unique_all['latest_salary']) / 100
unique_all['SalaryDifferenceinLast12Months%'] = unique_all['SalaryDifferenceinLast12Months%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferenceinLast12Months%'].isnull()])
unique_all['SalaryDifferenceinLast6Months%'] = unique_all['SalaryDifferenceinLast6Months%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferenceinLast6Months%'].isnull()])
unique_all['SalaryDifferencewithLastMonth%'] = unique_all['SalaryDifferencewithLastMonth%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferencewithLastMonth%'].isnull()])