具有两个列标识符的滚动平均值



我有一个DataFrameDateID,并尝试单独计算每个ID在三天内Score的滚动平均值。

Date        ID  Score
2022-01-02  1   1
2022-01-03  1   2
2022-01-04  1   1
2022-01-05  1   3
2022-01-02  2   5
2022-01-03  2   6
2022-01-04  2   7
2022-01-05  2   3

到目前为止,我只知道如何在一个特定列上创建一个滚动平均值,而不考虑第二个标识符ID

df[RollingMean3]=df[Score].rolling(3).mean()

我试着得到

Date        ID  Score  ScoreRollingMean3
2022-01-02  1   1      NaN
2022-01-03  1   2      NaN
2022-01-04  1   1      1.33
2022-01-05  1   3      2
2022-01-02  2   5      NaN
2022-01-03  2   6      NaN
2022-01-04  2   7      6
2022-01-05  2   3      5.33

对于可重复性:

df = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'ID':[1, 1, 1, 1, 2, 2, 2, 2],
'Score':[1, 2, 1, 3, 5, 6, 7, 3]})

谢谢

如果日期时间是连续的,则DataFrame.groupbySeries.droplevel一起使用:

df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
Date  ID  Score  RollingMean3
0  2022-01-02   1      1           NaN
1  2022-01-03   1      2           NaN
2  2022-01-04   1      1      1.333333
3  2022-01-05   1      3      2.000000
4  2022-01-02   2      5           NaN
5  2022-01-03   2      6           NaN
6  2022-01-04   2      7      6.000000
7  2022-01-05   2      3      5.333333

使用卷帘窗的通用解决方案3D可以使用DatetimeIndex

df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
ID  Score  RollingMean3
Date                               
2022-01-02   1      1      1.000000
2022-01-03   1      2      1.500000
2022-01-04   1      1      1.333333
2022-01-05   1      3      2.000000
2022-01-02   2      5      5.000000
2022-01-03   2      6      5.500000
2022-01-04   2      7      6.000000
2022-01-05   2      3      5.333333

最新更新