我想计算一个唯一值列表和一个权重列表的加权中值。权重表示每个值出现在列表中的频率。
示例:
real_data = [1,1,2,3,3,4,4,4]
values = [1,2,3,4]
weights = [2,1,2,3]
一种方法应该是:
np.median(np.repeat(values, weights))
然而,我觉得这有点低效,因为它首先生成了整个列表,如果权重很高,这可能会成为一个问题。有更有效的方法吗?
此外,出于好奇,你能想出一种方法把np.repeat写成列表理解吗?
我提出的解决方案:
def median_3(weights, values):
s=0
n=sum(weights)
for i,w in enumerate(weights):
s+=w
if s>n/2:
if n%2 == 0:
if s-w==n/2:
return (values[i]+values[i-1])/2
else:
return values[i]
else:
return values[i]
时间比较代码:
import timeit
def median_1(weights, values):
return np.median(np.repeat(values, weights))
def median_3(weights, values):
s=0
n=sum(weights)
for i,w in enumerate(weights):
s+=w
if s>n/2:
if n%2 == 0:
if s-w==n/2:
return (values[i]+values[i-1])/2
else:
return values[i]
else:
return values[i]
t1 = timeit.Timer(lambda: median_1(weights, values))
t3 = timeit.Timer(lambda: median_3(weights, values))
print(f"function median_1 for 1000 cycles: {t1.timeit(1000)} s")
print(f"function median 3 for 1000 cycles: {t3.timeit(1000)} s")
print(f" result from median_1 {median_1(weights, values)}")
print(f" result from median_3 {median_3(weights, values)}")
结果:
function median_1 for 1000 cycles: 0.051409600000000055 s
function median 3 for 1000 cycles: 0.0013161999999999896 s
result from median_1 3.0
result from median_3 3
希望这能有所帮助。它还应该适用于偶数个元素。