如何优化数据帧,以便每次使用 log 时都不会打印相同的输出?



我有一个python程序,它可以打印从各个博彩公司刮来的赔率差异。这是通过将赔率附加到pandas数据帧来实现的。每次运行程序时,我都想使用一个记录数据帧输出的日志,这样程序就不会打印出重复的差异。日志将记录数据帧的"Horse"列。当程序打印数据帧时,它会参考日志,看看"Horse"列中是否有重复的名称。

以下是数据帧输出的示例:

Race             Horse      Bookmaker   Odds  AvgOdds
13     Mackay R1        Which Lily  SportsBetting   2.45     2.04
15     Mackay R1        Which Lily         Bet365   2.40     2.04
17     Mackay R1  Molongle Drifter           Ubet   9.00     7.26
18     Mackay R1  Molongle Drifter        BetEasy   8.50     7.26
19     Mackay R1  Molongle Drifter           Neds   8.50     7.26
...          ...               ...            ...    ...      ...
1545  Mackay R10        Cold Power  SportsBetting   8.10     6.39
1547  Mackay R10        Cold Power         Bet365   8.00     6.39
1548  Mackay R10   All Star Rocket           Ubet   7.20     2.98
1560  Mackay R10           Dawlish      Sportsbet  14.00    11.65
1561  Mackay R10           Dawlish  SportsBetting  15.20    11.65

以下是我的代码中与数据帧有关的部分:

cols1 = ['Race', 'Horse', 'Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data, columns=cols1)
cols2 = ['Race', 'Horse', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
dfmerge = pd.merge(df1,df4,on='Horse',how='inner')
dfmerge2 = dfmerge[dfmerge['Odds']>dfmerge['AvgOdds']*1.15]
dfmerge3 = dfmerge2['Horse']

我建议您扩展初始数据框架,以包括以前报告的事件,这些事件会随着报告而更新。

然后,当您向数据集中添加更多行时,您可以重新运行程序,而不会看到以前报告的数据。

因此,给定这些数据(注意额外的列,您最初必须将其设置为'N'(:

Race             Horse      Bookmaker   Odds  Reported
13     Mackay R1        Which Lily  SportsBetting   2.45         N
15     Mackay R1        Which Lily         Bet365   2.40         N
17     Mackay R1  Molongle Drifter           Ubet   9.00         N
18     Mackay R1  Molongle Drifter        BetEasy   8.50         N
19     Mackay R1  Molongle Drifter           Neds   8.50         N
...          ...               ...            ...    ...       ...
1545  Mackay R10        Cold Power  SportsBetting   8.10         N
1547  Mackay R10        Cold Power         Bet365   8.00         N
1548  Mackay R10   All Star Rocket           Ubet   7.20         N
1560  Mackay R10           Dawlish      Sportsbet  14.00         N
1561  Mackay R10           Dawlish  SportsBetting  27.20         N

使用这个代码:

# Previously...
base_data = pd.DataFrame(...)
# Refactored code from example given
cols_raw = ['Race', 'Horse', 'Bookmaker', 'Odds']
raw_data = base_data[cols_raw]
cols_mean = ['Race', 'Horse', 'Odds']
mean_data = (
base_data[cols_mean]
# I assume this was meant to be by race and horse...
.groupby(by=['Race', 'Horse'], sort=False)  
.mean()
.reset_index()
.rename(columns={'Odds': 'AvgOdds'})
)
mean_data = round(mean_data)
report = pd.merge(raw_data, mean_data, on=['Race', 'Horse'], how='inner')
report = (
report[report['Odds'] > report['AvgOdds'] * 1.15]
['Horse']
)
# Filter out any horses that have already been reported on:
pre_reported_horses = base_data[base_data['Reported'] == 'Y']['Horse'].unique()
report = report[~report['Horse'].isin(pre_reported_horses)]
# And then update the Reported column for next time you run the code
reported_horses = pre_reported_horses | set(report['Horse'].unique())
base_data.loc[base_data['Horse'].isin(reported_horses), 'Reported'] = 'Y'

然后,可以在Reported设置为'N'的情况下将新数据附加到基本数据帧,并且在不看到异常概率的重复报告的情况下重新运行报告。

例如,如果你报告了马"Dawlish",那么你更新的base_data数据帧现在应该是这样的:

Race             Horse      Bookmaker   Odds  Reported
13     Mackay R1        Which Lily  SportsBetting   2.45         N
15     Mackay R1        Which Lily         Bet365   2.40         N
17     Mackay R1  Molongle Drifter           Ubet   9.00         N
18     Mackay R1  Molongle Drifter        BetEasy   8.50         N
19     Mackay R1  Molongle Drifter           Neds   8.50         N
...          ...               ...            ...    ...       ...
1545  Mackay R10        Cold Power  SportsBetting   8.10         N
1547  Mackay R10        Cold Power         Bet365   8.00         N
1548  Mackay R10   All Star Rocket           Ubet   7.20         N
1560  Mackay R10           Dawlish      Sportsbet  14.00         Y
1561  Mackay R10           Dawlish  SportsBetting  27.20         Y

最新更新