获取大熊猫每个id在过去12个月(可变)内的百分比变化



我为每个ID都有一个这样的数据集;

工资
月数 ID
2020-12-01 1 5240.0
2020-11-01 1 4532.0
2020-10-01 1 4532.0
2020-09-01 1 4532.0
2020-08-01 1 4532.0
2020-07-01 1 4532.0
2020-06-01 1 4532.0
2020-05-01 1 4532.0
2020-04-01 1 4532.0
2020-03-01 1 4532.0
2020-02-01 1 3808.0
2020-01-01 1 3808.0
2019-12-01 1 3808.0
2019-11-01 1 3808.0
2019-10-01 1 3808.0
2019-09-01 1 3808.0
2019-08-01
2019-07-01 1 3808.0
2019-06-01 1 3808.0
2019-05-01 1 3808.0
2019-04-01 1 3096.0
2019-03-01 1 3096.0
2019-02-01 1 3096.0
2019-01-01 1 3096.0
2018-12-01 1 3096.0
2018-11-01 1 3096.0
2018-10-01
2018-09-01 1 2669.0
2018-08-01 1 2669.0
2018-07-01 1 2669.0
2020-12-01 2 1632.0
2020年11月1日 2 1632.0
2020-10-01 2 1632.0
2020-09-01 2 1600.0

可能有更好的解决方案,但目前我使用了这样的方法。

#Find the percentage changes, but it remains Nan for the last months.
unique_all['SalaryDifferencewithLastMonth%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-1))
unique_all['SalaryDifferenceinLast6Months%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-6))
unique_all['SalaryDifferenceinLast12Months%'] = (unique_all.groupby(['ID'])['Salary'].pct_change(-12))
#Find the salary of the latest month(at the bottom) for each id
latest_salary = pd.DataFrame(unique_all.loc[unique_all.groupby('ID').Months.idxmin()][['ID','Salary']].reset_index(drop=True))
latest_salary = latest_salary.rename(columns={'Salary':'latest_salary'})
unique_all= unique_all.merge(latest_salary,on='ID',how='left')

#Find the percentage change with the latest month of the id and assign to the Nan values
unique_all['salary_diff2'] = ((unique_all['Salary'] - unique_all['latest_salary']) * 100 / unique_all['latest_salary']) / 100
unique_all['SalaryDifferenceinLast12Months%'] = unique_all['SalaryDifferenceinLast12Months%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferenceinLast12Months%'].isnull()]) 
unique_all['SalaryDifferenceinLast6Months%'] = unique_all['SalaryDifferenceinLast6Months%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferenceinLast6Months%'].isnull()]) 
unique_all['SalaryDifferencewithLastMonth%'] = unique_all['SalaryDifferencewithLastMonth%'].fillna(unique_all['salary_diff2'].loc[unique_all['SalaryDifferencewithLastMonth%'].isnull()]) 

最新更新