我有一个包含Football数据的DataFrame,其中每行代表一场比赛。DataFrame包括以下列:"Date"、"HomeTeam"、"AwayTeam"、"Points_HomeTeam"one_answers"Points_AwayTeams"。
+--------------------------------------------------------------------------+
| 'Date' 'HomeTeam' 'AwayTeam' 'Points_HomeTeam' 'Points_AwayTeam' |
+--------------------------------------------------------------------------+
| 2000-08-19 Charlton Man City 0 3 |
| 2000-08-19 Chelsea Arsenal 1 1 |
| 2000-08-23 Coventry Man City 3 0 |
| 2000-08-25 Man City Liverpool 1 1 |
| 2000-08-28 Derby Man City 1 1 |
| 2000-08-31 Leeds Chelsea 3 0 |
| 2000-08-31 Man City Everton 3 0 |
+--------------------------------------------------------------------------+
我想加入一个列,显示主队在最近两场客场比赛中的积分总和,即前两行实例的"points_AwayTeam"列中的值总和,其中"AwayTeam"等于相应当前行的"HomeTeam"。
例如,在下表中,"HomeTeam"列中第一次出现"Man City"的新列的值为"3"("Points_AwayTeam"列中前两次出现"曼城"的值之和,即0+3(类似地,"HomeTeam"列中第二次出现"Man City"的新列的值为"1"(1+0(。其他行的值将为"NA",因为没有其他"HomeTeam"在"AwayTeam"列中出现两次。
+-------------------------------------------------------------------------------------+
| 'Date' 'HomeTeam' 'AwayTeam' 'Points_HomeTeam' 'Points_AwayTeam' 'New Column' |
+-------------------------------------------------------------------------------------+
| 2000-08-19 Charlton Man City 0 3 NA |
| 2000-08-19 Chelsea Arsenal 1 1 NA |
| 2000-08-23 Coventry Man City 3 0 NA |
| 2000-08-25 Man City Liverpool 1 1 3 |
| 2000-08-28 Derby Man City 1 1 NA |
| 2000-08-31 Leeds Chelsea 3 0 NA |
| 2000-08-31 Man City Everton 3 0 1 |
+-------------------------------------------------------------------------------------+
我用以下代码计算了"主队"在最近两场主场比赛中的积分总和:
f = lambda x: x.rolling(window = rolling_games, min_periods = rolling_games).sum().shift()
df['HomeTeam_HomePoints'] = df.groupby('HomeTeam')['Points_HomeTeam'].apply(f).reset_index(drop = True, level = 0)
如何根据单独列中的值计算跨行的滚动和?
非常感谢!
这里有一个解决方案:
away = df[["Date", "AwayTeam", "Points_AwayTeam"]].copy()
# Create a rolling sum for the away column.
away["roll_sum"] = away.groupby("AwayTeam")["Points_AwayTeam"].transform(lambda x: x.rolling(2).sum())
# for every match, we now have to find the last rolling sum
# of 'away' for the 'home' team.
#
# We're going to use merge_asof to do that:
# The first step of this function is to match home-teams on the left
# to away teams on the left. (done via left_by and right_by)
# then, for every date on the left, we're looking for the closest
# (previous) date on the right (this is done by the 'on' argument).
res=pd.merge_asof(df, away, on= "Date", left_by="HomeTeam", right_by="AwayTeam", suffixes=["", "_roll"])
res.drop(["AwayTeam_roll", "Points_AwayTeam_roll"], axis=1, inplace = True)
print(res)
输出:
Date HomeTeam AwayTeam Points_HomeTeam Points_AwayTeam roll_sum
0 2000-08-19 Charlton Man-City 0 3 NaN
1 2000-08-19 Chelsea Arsenal 1 1 NaN
2 2000-08-23 Coventry Man-City 3 0 NaN
3 2000-08-25 Man-City Liverpool 1 1 3.0
4 2000-08-28 Derby Man-City 1 1 NaN
5 2000-08-31 Leeds Chelsea 3 0 NaN
6 2000-08-31 Man-City Everton 3 0 1.0