加权中位数和列表理解



我想计算一个唯一值列表和一个权重列表的加权中值。权重表示每个值出现在列表中的频率。

示例:

real_data = [1,1,2,3,3,4,4,4]
values = [1,2,3,4]
weights = [2,1,2,3]

一种方法应该是:

np.median(np.repeat(values, weights))

然而,我觉得这有点低效,因为它首先生成了整个列表,如果权重很高,这可能会成为一个问题。有更有效的方法吗?

此外,出于好奇,你能想出一种方法把np.repeat写成列表理解吗?

我提出的解决方案:

def median_3(weights, values):
s=0
n=sum(weights)
for i,w in enumerate(weights):
s+=w
if s>n/2:
if n%2 == 0:
if s-w==n/2:
return (values[i]+values[i-1])/2
else:
return values[i]
else:
return values[i]

时间比较代码:

import timeit

def median_1(weights, values):
return np.median(np.repeat(values, weights))
def median_3(weights, values):
s=0
n=sum(weights)
for i,w in enumerate(weights):
s+=w
if s>n/2:
if n%2 == 0:
if s-w==n/2:
return (values[i]+values[i-1])/2
else:
return values[i]
else:
return values[i]

t1 = timeit.Timer(lambda: median_1(weights, values))
t3 = timeit.Timer(lambda: median_3(weights, values))
print(f"function median_1 for 1000 cycles: {t1.timeit(1000)} s")
print(f"function median 3 for 1000 cycles: {t3.timeit(1000)} s")

print(f" result from median_1 {median_1(weights, values)}")
print(f" result from median_3 {median_3(weights, values)}")

结果:

function median_1 for 1000 cycles: 0.051409600000000055 s
function median 3 for 1000 cycles: 0.0013161999999999896 s
result from median_1 3.0
result from median_3 3

希望这能有所帮助。它还应该适用于偶数个元素。

最新更新