我有以下数据帧,它是一个时间列表:
Name Time Excuse Injured Margin
John 15 nan 0 1
John 18 nan 0 5
John 30 leg injury 1 11
John 16 nan 0 4
John 40 nan 0 18
John 15 nan 0 3
John 22 nan 0 6
然后我使用一个函数来获得前最后5次偏移的平均值:
df['last5'] = df.groupby(['Name']).Time.apply(
lambda x: x.shift().rolling(5, min_periods=1).mean().fillna(.5))
这是可行的,但我希望执行相同的计算,但如果存在Excuse
、Injured = 1
或Margin >10
,我希望忽略Time
。
我的预期输出是:
Name Time Excuse Injured Margin last5
John 15 0 1 .5
John 18 0 5 15
John 30 leg injury 1 11 16.5
John 16 0 4 16.5
John 40 0 18 16.33
John 15 0 3 16.33
John 22 0 6 16
我可以在原始函数的末尾添加一个条件吗?提前感谢!
- 在应用
rolling
计算之前,您可以根据标准筛选数据帧 - 根据需要,使用
bfill()
反向填充NaN
值:
df['last5'] = (df[(df['Excuse'].isnull()) & (df['Injured'] != 1) & (df['Margin'] <= 10)]
.groupby(['Name']).Time.apply(lambda x: x.shift().rolling(5, min_periods=1)
.mean().fillna(.5)))
df['last5'] = df.groupby(['Name'])['last5'].bfill()
df
Out[1]:
Name Time Excuse Injured Margin last5
0 John 15 NaN 0 1 0.500000
1 John 18 NaN 0 5 15.000000
2 John 30 leg injury 1 11 16.500000
3 John 16 NaN 0 4 16.500000
4 John 40 NaN 0 18 16.333333
5 John 15 NaN 0 3 16.333333
6 John 22 NaN 0 6 16.000000