如何为每个ID的特定变量创建滞后变量?

我想创建一个延迟变量名为lag_ins

哪个看起来像:

year  ID    emissions   ins    lag_ins
2010   1     10          0       Nan
2011   1     20          1       0
2012   1     30          1       1
2010   2     10          1       Nan
2011   2     20          0       1
2012   2     40          1       0

我使用了以下代码:

df['ID'] = df.groupby(['year']).cumcount()+1
df4['lag_ins'] = np.insert(df.ins.values,0,0)[:1]
df.loc[df.groupby(["ID"]).cumcount() == 0,'lag_ins']= np.nan

但它不工作。

你可以直接做groupby.shift:

df['lag_ins'] = df.groupby('ID').ins.shift()
df
#   year  ID  emissions  ins  lag_ins
#0  2010   1         10    0      NaN
#1  2011   1         20    1      0.0
#2  2012   1         30    1      1.0
#3  2010   2         10    1      NaN
#4  2011   2         20    0      1.0
#5  2012   2         40    1      0.0

如果需要移位操作按year排序:

df['lag_ins'] = df.sort_values('year').groupby('ID').ins.shift()
df
#   year  ID  emissions  ins  lag_ins
#0  2010   1         10    0      NaN
#1  2011   1         20    1      0.0
#2  2012   1         30    1      1.0
#3  2010   2         10    1      NaN
#4  2011   2         20    0      1.0
#5  2012   2         40    1      0.0

相关内容

最新更新

热门标签：