使用更重要的最近观测值计算均值



我正在构建一种算法,使用前几场比赛的表现来预测体育赛事的结果。例如,我可能有两个如下所示的列表:

# list of game numbers
game_number = [1, 2, 3, 4, 5, 6, 7] 
# list of points scored   
points_scored = [100, 106, 99, 106, 89, 94, 113]

我可以使用以下方法轻松计算平均值:

# calculate mean
mean_points_scored = np.mean(points_scored)

但是,我希望最近的游戏在计算平均值时权重更大。有人有这样做的经验吗?

你可以用np.average做加权平均值

mean_points_scored = np.average(points_scored, weights=game_number)

我认为必须在不同的数组中定义权重:

weights_define = [1, 1, 1, 1, 1, 2, 3]
mean_points_scored = np.average(points_scored, weights=weights_define)  

因为威尔克本定义它的方式不准确,太夸张了,根本不数学!

您可以查看Excel解释,其中解释了数学的真正工作原理(代码基本上是数学不要忘记!( --> Excel Debunk

权重的定义可能基于以下某些定义的标准。x系数可能会更改,或者砝码列表的零件数量可能会因要求而异。假设a,b,c15数据点的三个部分,(假设x的因子在权重列表的末尾更大,因为它是针对给出recent games更重的权重(

a = [(3*x) for x in range(1,6)]
b = [(4*x) for x in range(6,11)]
c = [(7*x) for x in range(11,16)]
weights_define = a+b+c
game_number = [1, 2, 3, 4, 5, 6, 7,8,9,10,11,12,13,14,15]
points_scored = [100, 106, 99, 106, 89, 94, 113, 112,109,111,97,95,102,107,103]
mean_points_scored = np.average(points_scored, weights=weights_define)  
print(mean_points_scored)

输出:

102.77878787878788

最新更新