取最近几天N的平均值,包括NaN



我有这个数据框:

ID      Date  X  123_Var  456_Var  789_Var
A  16-07-19  3    777.0    250.0    810.0
A  17-07-19  9    637.0    121.0    529.0
A  20-07-19  2    295.0    272.0    490.0
A  21-07-19  3    778.0    600.0    544.0
A  22-07-19  6    741.0    792.0    907.0
A  25-07-19  6    435.0    416.0    820.0
A  26-07-19  8    590.0    455.0    342.0
A  27-07-19  6    763.0    476.0    753.0
A  02-08-19  6    717.0    211.0    454.0
A  03-08-19  6    152.0    442.0    475.0
A  05-08-19  6    564.0    340.0    302.0
A  07-08-19  6    105.0    929.0    633.0
A  08-08-19  6    948.0    366.0    586.0
B  07-08-19  4    509.0    690.0    406.0
B  08-08-19  2    413.0    725.0    414.0
B  12-08-19  2    170.0    702.0    912.0
B  13-08-19  3    851.0    616.0    477.0
B  14-08-19  9    475.0    447.0    555.0
B  15-08-19  1    412.0    403.0    708.0
B  17-08-19  2    299.0    537.0    321.0
B  18-08-19  4    310.0    119.0    125.0
C  16-07-19  3    777.0    250.0    810.0
C  17-07-19  9    637.0    121.0    529.0
C  20-07-19  2      NaN      NaN      NaN
C  21-07-19  3      NaN      NaN      NaN
C  22-07-19  6    741.0    792.0    907.0
C  25-07-19  6      NaN      NaN      NaN
C  26-07-19  8    590.0    455.0    342.0
C  27-07-19  6    763.0    476.0    753.0
C  02-08-19  6    717.0    211.0    454.0
C  03-08-19  6      NaN      NaN      NaN
C  05-08-19  6    564.0    340.0    302.0
C  07-08-19  6      NaN      NaN      NaN
C  08-08-19  6    948.0    366.0    586.0

我想显示n最后几天(使用Date列(的mean值,不包括当天的值

我正在使用此代码(我应该怎么做才能解决此问题?

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
n = 4
cols = df.filter(regex='Var').columns
df = df.set_index('Date')
df_ = df.set_index('ID', append=True).swaplevel(1,0)
df1 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].count()
df2 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].mean()
df3 = (df1.mul(df2)
.sub(df_[cols])
.div(df1[cols].sub(1)).add_suffix(f'_{n}')
)
df4 = df_.join(df3)

预期成果:

ID      Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
A  16-07-19  3    777.0    250.0    810.0         NaN         NaN        NaN
A  17-07-19  9    637.0    121.0    529.0  777.000000  250.000000      810.0
A  20-07-19  2    295.0    272.0    490.0  707.000000  185.500000      669.5
A  21-07-19  3    778.0    600.0    544.0  466.000000  196.500000      509.5
A  22-07-19  6    741.0    792.0    907.0  536.500000  436.000000      517.0
A  25-07-19  6    435.0    416.0    820.0  759.500000  696.000000      725.5
A  26-07-19  8    590.0    455.0    342.0  588.000000  604.000000      863.5
A  27-07-19  6    763.0    476.0    753.0  512.500000  435.500000      581.0
A  02-08-19  6    717.0    211.0    454.0         NaN         NaN        NaN
A  03-08-19  6    152.0    442.0    475.0  717.000000  211.000000      454.0
A  05-08-19  6    564.0    340.0    302.0  434.500000  326.500000      464.5
A  07-08-19  6    105.0    929.0    633.0  358.000000  391.000000      388.5
A  08-08-19  6    948.0    366.0    586.0  334.500000  634.500000      467.5
B  07-08-19  4    509.0    690.0    406.0         NaN         NaN        NaN
B  08-08-19  2    413.0    725.0    414.0  509.000000  690.000000      406.0
B  12-08-19  2    170.0    702.0    912.0  413.000000  725.000000      414.0
B  13-08-19  3    851.0    616.0    477.0  291.500000  713.500000      663.0
B  14-08-19  9    475.0    447.0    555.0  510.500000  659.000000      694.5
B  15-08-19  1    412.0    403.0    708.0  498.666667  588.333333      648.0
B  17-08-19  2    299.0    537.0    321.0  579.333333  488.666667      580.0
B  18-08-19  4    310.0    119.0    125.0  395.333333  462.333333      528.0
C  16-07-19  3    777.0    250.0    810.0         NaN         NaN        NaN
C  17-07-19  9    637.0    121.0    529.0  777.000000  250.000000      810.0
C  20-07-19  2      NaN      NaN      NaN  707.000000  185.500000      669.5
C  21-07-19  3      NaN      NaN      NaN  637.000000  121.000000      529.0
C  22-07-19  6    741.0    792.0    907.0         NaN         NaN        NaN
C  25-07-19  6      NaN      NaN      NaN  741.000000  792.000000      907.0
C  26-07-19  8    590.0    455.0    342.0  741.000000  792.000000      907.0
C  27-07-19  6    763.0    476.0    753.0  590.000000  455.000000      342.0
C  02-08-19  6    717.0    211.0    454.0         NaN         NaN        NaN
C  03-08-19  6      NaN      NaN      NaN  717.000000  211.000000      454.0
C  05-08-19  6    564.0    340.0    302.0  717.000000  211.000000      454.0
C  07-08-19  6      NaN      NaN      NaN  564.000000  340.000000      302.0
C  08-08-19  6    948.0    366.0    586.0  564.000000  340.000000      302.0

小数点后的数字不是问题。

这些线程可能会有所帮助:

取最近几天 N 的平均值

取最近几天 N 的最小值

尝试:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = (df.groupby('ID')['Date','123_Var','456_Var','789_Var'].rolling('4D', on='Date', closed='left').mean())
dfx = (df.set_index(['ID','Date'])
.join(df1.reset_index().set_index(['ID','Date']), rsuffix='_4')
.reset_index()
.drop('level_1',axis=1))
print(dfx.to_string())
ID       Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
0   A 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
1   A 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
2   A 2019-07-20  2    295.0    272.0    490.0  707.000000  185.500000      669.5
3   A 2019-07-21  3    778.0    600.0    544.0  466.000000  196.500000      509.5
4   A 2019-07-22  6    741.0    792.0    907.0  536.500000  436.000000      517.0
5   A 2019-07-25  6    435.0    416.0    820.0  759.500000  696.000000      725.5
6   A 2019-07-26  8    590.0    455.0    342.0  588.000000  604.000000      863.5
7   A 2019-07-27  6    763.0    476.0    753.0  512.500000  435.500000      581.0
8   A 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
9   A 2019-08-03  6    152.0    442.0    475.0  717.000000  211.000000      454.0
10  A 2019-08-05  6    564.0    340.0    302.0  434.500000  326.500000      464.5
11  A 2019-08-07  6    105.0    929.0    633.0  358.000000  391.000000      388.5
12  A 2019-08-08  6    948.0    366.0    586.0  334.500000  634.500000      467.5
13  B 2019-08-07  4    509.0    690.0    406.0         NaN         NaN        NaN
14  B 2019-08-08  2    413.0    725.0    414.0  509.000000  690.000000      406.0
15  B 2019-08-12  2    170.0    702.0    912.0  413.000000  725.000000      414.0
16  B 2019-08-13  3    851.0    616.0    477.0  170.000000  702.000000      912.0
17  B 2019-08-14  9    475.0    447.0    555.0  510.500000  659.000000      694.5
18  B 2019-08-15  1    412.0    403.0    708.0  498.666667  588.333333      648.0
19  B 2019-08-17  2    299.0    537.0    321.0  579.333333  488.666667      580.0
20  B 2019-08-18  4    310.0    119.0    125.0  395.333333  462.333333      528.0
21  C 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
22  C 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
23  C 2019-07-20  2      NaN      NaN      NaN  707.000000  185.500000      669.5
24  C 2019-07-21  3      NaN      NaN      NaN  637.000000  121.000000      529.0
25  C 2019-07-22  6    741.0    792.0    907.0         NaN         NaN        NaN
26  C 2019-07-25  6      NaN      NaN      NaN  741.000000  792.000000      907.0
27  C 2019-07-26  8    590.0    455.0    342.0  741.000000  792.000000      907.0
28  C 2019-07-27  6    763.0    476.0    753.0  590.000000  455.000000      342.0
29  C 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
30  C 2019-08-03  6      NaN      NaN      NaN  717.000000  211.000000      454.0
31  C 2019-08-05  6    564.0    340.0    302.0  717.000000  211.000000      454.0
32  C 2019-08-07  6      NaN      NaN      NaN  564.000000  340.000000      302.0
33  C 2019-08-08  6    948.0    366.0    586.0  564.000000  340.000000      302.0

最新更新