我有一个python程序,它可以打印从各个博彩公司刮来的赔率差异。这是通过将赔率附加到pandas数据帧来实现的。每次运行程序时,我都想使用一个记录数据帧输出的日志,这样程序就不会打印出重复的差异。日志将记录数据帧的"Horse"列。当程序打印数据帧时,它会参考日志,看看"Horse"列中是否有重复的名称。
以下是数据帧输出的示例:
Race Horse Bookmaker Odds AvgOdds
13 Mackay R1 Which Lily SportsBetting 2.45 2.04
15 Mackay R1 Which Lily Bet365 2.40 2.04
17 Mackay R1 Molongle Drifter Ubet 9.00 7.26
18 Mackay R1 Molongle Drifter BetEasy 8.50 7.26
19 Mackay R1 Molongle Drifter Neds 8.50 7.26
... ... ... ... ... ...
1545 Mackay R10 Cold Power SportsBetting 8.10 6.39
1547 Mackay R10 Cold Power Bet365 8.00 6.39
1548 Mackay R10 All Star Rocket Ubet 7.20 2.98
1560 Mackay R10 Dawlish Sportsbet 14.00 11.65
1561 Mackay R10 Dawlish SportsBetting 15.20 11.65
以下是我的代码中与数据帧有关的部分:
cols1 = ['Race', 'Horse', 'Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data, columns=cols1)
cols2 = ['Race', 'Horse', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
dfmerge = pd.merge(df1,df4,on='Horse',how='inner')
dfmerge2 = dfmerge[dfmerge['Odds']>dfmerge['AvgOdds']*1.15]
dfmerge3 = dfmerge2['Horse']
我建议您扩展初始数据框架,以包括以前报告的事件,这些事件会随着报告而更新。
然后,当您向数据集中添加更多行时,您可以重新运行程序,而不会看到以前报告的数据。
因此,给定这些数据(注意额外的列,您最初必须将其设置为'N'
(:
Race Horse Bookmaker Odds Reported
13 Mackay R1 Which Lily SportsBetting 2.45 N
15 Mackay R1 Which Lily Bet365 2.40 N
17 Mackay R1 Molongle Drifter Ubet 9.00 N
18 Mackay R1 Molongle Drifter BetEasy 8.50 N
19 Mackay R1 Molongle Drifter Neds 8.50 N
... ... ... ... ... ...
1545 Mackay R10 Cold Power SportsBetting 8.10 N
1547 Mackay R10 Cold Power Bet365 8.00 N
1548 Mackay R10 All Star Rocket Ubet 7.20 N
1560 Mackay R10 Dawlish Sportsbet 14.00 N
1561 Mackay R10 Dawlish SportsBetting 27.20 N
使用这个代码:
# Previously...
base_data = pd.DataFrame(...)
# Refactored code from example given
cols_raw = ['Race', 'Horse', 'Bookmaker', 'Odds']
raw_data = base_data[cols_raw]
cols_mean = ['Race', 'Horse', 'Odds']
mean_data = (
base_data[cols_mean]
# I assume this was meant to be by race and horse...
.groupby(by=['Race', 'Horse'], sort=False)
.mean()
.reset_index()
.rename(columns={'Odds': 'AvgOdds'})
)
mean_data = round(mean_data)
report = pd.merge(raw_data, mean_data, on=['Race', 'Horse'], how='inner')
report = (
report[report['Odds'] > report['AvgOdds'] * 1.15]
['Horse']
)
# Filter out any horses that have already been reported on:
pre_reported_horses = base_data[base_data['Reported'] == 'Y']['Horse'].unique()
report = report[~report['Horse'].isin(pre_reported_horses)]
# And then update the Reported column for next time you run the code
reported_horses = pre_reported_horses | set(report['Horse'].unique())
base_data.loc[base_data['Horse'].isin(reported_horses), 'Reported'] = 'Y'
然后,可以在Reported
设置为'N'
的情况下将新数据附加到基本数据帧,并且在不看到异常概率的重复报告的情况下重新运行报告。
例如,如果你报告了马"Dawlish",那么你更新的base_data数据帧现在应该是这样的:
Race Horse Bookmaker Odds Reported
13 Mackay R1 Which Lily SportsBetting 2.45 N
15 Mackay R1 Which Lily Bet365 2.40 N
17 Mackay R1 Molongle Drifter Ubet 9.00 N
18 Mackay R1 Molongle Drifter BetEasy 8.50 N
19 Mackay R1 Molongle Drifter Neds 8.50 N
... ... ... ... ... ...
1545 Mackay R10 Cold Power SportsBetting 8.10 N
1547 Mackay R10 Cold Power Bet365 8.00 N
1548 Mackay R10 All Star Rocket Ubet 7.20 N
1560 Mackay R10 Dawlish Sportsbet 14.00 Y
1561 Mackay R10 Dawlish SportsBetting 27.20 Y