我有一个DataFrame
,Date
和ID
,并尝试单独计算每个ID
在三天内Score
的滚动平均值。
Date ID Score
2022-01-02 1 1
2022-01-03 1 2
2022-01-04 1 1
2022-01-05 1 3
2022-01-02 2 5
2022-01-03 2 6
2022-01-04 2 7
2022-01-05 2 3
到目前为止,我只知道如何在一个特定列上创建一个滚动平均值,而不考虑第二个标识符ID
:
df[RollingMean3]=df[Score].rolling(3).mean()
我试着得到
Date ID Score ScoreRollingMean3
2022-01-02 1 1 NaN
2022-01-03 1 2 NaN
2022-01-04 1 1 1.33
2022-01-05 1 3 2
2022-01-02 2 5 NaN
2022-01-03 2 6 NaN
2022-01-04 2 7 6
2022-01-05 2 3 5.33
对于可重复性:
df = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'ID':[1, 1, 1, 1, 2, 2, 2, 2],
'Score':[1, 2, 1, 3, 5, 6, 7, 3]})
谢谢
如果日期时间是连续的,则DataFrame.groupby
与Series.droplevel
一起使用:
df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
Date ID Score RollingMean3
0 2022-01-02 1 1 NaN
1 2022-01-03 1 2 NaN
2 2022-01-04 1 1 1.333333
3 2022-01-05 1 3 2.000000
4 2022-01-02 2 5 NaN
5 2022-01-03 2 6 NaN
6 2022-01-04 2 7 6.000000
7 2022-01-05 2 3 5.333333
使用卷帘窗的通用解决方案3D
可以使用DatetimeIndex
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
ID Score RollingMean3
Date
2022-01-02 1 1 1.000000
2022-01-03 1 2 1.500000
2022-01-04 1 1 1.333333
2022-01-05 1 3 2.000000
2022-01-02 2 5 5.000000
2022-01-03 2 6 5.500000
2022-01-04 2 7 6.000000
2022-01-05 2 3 5.333333