如何在panda中基于多列条件跳过lambda移位滚动函数中的单元格



我有以下数据帧,它是一个时间列表:

Name      Time     Excuse     Injured       Margin
John       15       nan           0            1
John       18       nan           0            5
John       30       leg injury    1            11
John       16       nan           0            4
John       40       nan           0            18
John       15       nan           0            3
John       22       nan           0            6 

然后我使用一个函数来获得前最后5次偏移的平均值:

df['last5'] = df.groupby(['Name']).Time.apply(
lambda x: x.shift().rolling(5, min_periods=1).mean().fillna(.5))

这是可行的,但我希望执行相同的计算,但如果存在ExcuseInjured = 1Margin >10,我希望忽略Time

我的预期输出是:

Name      Time       Excuse     Injured     Margin     last5
John       15                     0            1         .5
John       18                     0            5         15
John       30       leg injury    1            11        16.5
John       16                     0            4         16.5
John       40                     0            18        16.33
John       15                     0            3         16.33
John       22                     0            6         16

我可以在原始函数的末尾添加一个条件吗?提前感谢!

  1. 在应用rolling计算之前,您可以根据标准筛选数据帧
  2. 根据需要,使用bfill()反向填充NaN值:

df['last5'] = (df[(df['Excuse'].isnull()) & (df['Injured'] != 1) & (df['Margin'] <= 10)]
.groupby(['Name']).Time.apply(lambda x: x.shift().rolling(5, min_periods=1)
.mean().fillna(.5)))
df['last5'] = df.groupby(['Name'])['last5'].bfill()
df
Out[1]: 
Name  Time      Excuse  Injured  Margin      last5
0  John    15         NaN        0       1   0.500000
1  John    18         NaN        0       5  15.000000
2  John    30  leg injury        1      11  16.500000
3  John    16         NaN        0       4  16.500000
4  John    40         NaN        0      18  16.333333
5  John    15         NaN        0       3  16.333333
6  John    22         NaN        0       6  16.000000

最新更新