Pandas使用map或apply从使用字典的调整中生成新列



我有一项体育赛事的数据,我知道每个主场都有偏见,我想对此进行调整。我已经创建了一个字典,其中竞技场是键,值是我想要做的调整。

所以对于每一行,我想取主队,得到调整值,然后从距离列中减去它。我有下面的代码,但我似乎不能让它工作。

#Making the dictionary, this is working properly
teams = df.home_team.unique().tolist()
adj_shot_dict = {}
for team in teams:
df_temp = df[df.home_team == team]
average = round(df_temp.event_distance.mean(),2)
adj_shot_dict[team] = average
def make_adjustment(df):
team = df.home_team
distance = df.event_distance
adj_dist = distance - adj_shot_dict[team]
return adj_dist
df['adj_dist'] = df['event_distance'].apply(make_adjustment)

iuc,你已经有了字典,你想简单地减去adj_shot_dictevent_distance列:

df['adj_dist'] = df['event_distance'] - df['home_team'].map(adj_shot_dict)

老回答

home_team分组,计算event_distance的平均值,然后减去event_distance的结果:

df['adj_dist'] = df['event_distance'] 
- df.groupby('home_team')['event_distance'] 
.transform('mean').round(2)
# OR
df['adj_dist'] = df.groupby('home_team')['event_distance'] 
.apply(lambda x: x - x.mean().round(2))

>>> len(df)
60000
>>> df.sample(5)
home_team  event_distance
5     team3              60
4     team2              50
1     team2              20
1     team2              20
0     team1              10
def loop():
teams = df.home_team.unique().tolist()
adj_shot_dict = {}
for team in teams:
df_temp = df[df.home_team == team]
average = round(df_temp.event_distance.mean(),2)
adj_shot_dict[team] = average
def loop2():
df.groupby('home_team')['event_distance'].transform('mean').round(2)
>>> %timeit loop()
13.5 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit loop2()
3.62 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Total process
>>> %timeit df['event_distance'] - df.groupby('home_team')['event_distance'].transform('mean').round(2)
3.7 ms ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

最新更新