计算迭代器平均值的生成器



我有一个计算素数的迭代器。我想创建一个使用素数迭代器作为输入参数的生成器,用于评估循环素数的平均值:

from itertools import islice, tee
def only_primes(stream):
    try:
        while True:
            is_valid, value = next(stream)
            while not is_valid:
                is_valid, value = next(stream)
            yield value
    except StopIteration:
        return
def is_prime(n):
    if n < 2:
        return False, n
    elif n == 2:
        return True, n
    sqrt_n = int(n**0.5)+1
    return len([i for i in range(2, sqrt_n+1) if n % i == 0]) == 0, n
prime_iterator = only_primes(map(is_prime, range(100)))
def prime_av(stream):
"""Generator that yields average value of looped prime numbers"""
    n = 0
    stats = dict()
    stats['mean'] = 0
    try:
        while True:
            prime = next(stream)
            n += 1
            stats['mean'] *= n - 1
            stats['mean'] += prime
            stats['mean'] /= n
            yield stats
    except StopIteration:
        return

如果我同时循环遍历rawprime_av(stats)迭代器,则只打印最后一个平均值。为什么?

raw, stats = tee(prime_iterator)    
list(islice(zip(raw, prime_av(stats)), 10))

输出:

[(2, {'mean': 12.9}),
 (3, {'mean': 12.9}),
 (5, {'mean': 12.9}),
 (7, {'mean': 12.9}),
 (11, {'mean': 12.9}),
 (13, {'mean': 12.9}),
 (17, {'mean': 12.9}),
 (19, {'mean': 12.9}),
 (23, {'mean': 12.9}),
 (29, {'mean': 12.9})]

问题是平均迭代器不断更改相同的字典对象并生成它。如果您在循环期间打印结果,则结果是您所期望的,但是如果您将结果放在列表中(就像您正在做的那样),那么最终该列表将仅包含对同一对象的引用,当然该对象也将具有上次计算的平均值的值。

例如,将代码更改为:

def prime_av(stream):
    """Generator that yields average value of looped prime numbers"""
    n = 0
    S = 0
    try:
        while True:
            prime = next(stream)
            n += 1
            S *= n - 1
            S += prime
            S /= n
            yield {"mean": S}
    except StopIteration:
        return

将按您的预期运行,因为在每次迭代时都会分配一个新的新字典。

问题出在这里:

stats['mean'] = 0

在这里:

yield stats

会看到重复打印的相同值,因为您正在重复生成对同一字典的引用。这些引用都保存在一个列表中。然后打印列表。如果要在每次更新时查看此词典的中间状态,请在每次更改时打印它,而不是执行所有更新并打印它们。这就像更改它一样简单:

print(list(islice(zip(raw, prime_av(stats)), 10)))

像这样:

for i in islice(zip(raw, prime_av(stats)), 10):
    print(*i)

如果需要这些均值的列表,则需要将它们添加到列表中,而不是通过更改以下内容来重复更新单个值:

def prime_av(stream):
    """Generator that yields average value of looped prime numbers"""
    n = 0
    stats = dict()
    stats['mean'] = 0
    try:
        while True:
            prime = next(stream)
            n += 1
            stats['mean'] *= n - 1
            stats['mean'] += prime
            stats['mean'] /= n
            yield stats
    except StopIteration:
        return

对此:

def prime_av(stream):
    """Generator that yields average value of looped prime numbers"""
    n = 0
    stats = dict()
    stats['mean'] = [0]
    try:
        while True:
            prime = next(stream)
            n += 1
            stats['mean'].append(stats['mean'][-1])
            stats['mean'][-1] *= n - 1
            stats['mean'][-1] += prime
            stats['mean'][-1] /= n
            yield stats
    except StopIteration:
        return

然后,当您执行此操作时:

x = list(islice(zip(raw, prime_av(stats)), 10))

值字典在x[1][1]

{'mean': [0, 2.0, 2.5, 3.3333333333333335, 4.25, 5.6, 6.833333333333333, 8.285714285714286, 9.625, 11.11111111111111, 12.9]}

最新更新