对潜在的发电机进行迭代



我需要对熊猫流进行迭代。系列对象(我想使用的对象与我们无关(。可选地,将任意函数应用于每个系列,这里是关键,该任意函数可以是生成函数,生成两个(或多个(值。我对more_itertools.flatten函数抱有希望,但它没有帮助,因为它会在正则函数或没有函数映射到生成器上的情况下中断。有没有办法把这个可迭代的变成一个简单的Series对象生成器?这里有一个简单的例子来说明这个问题:

In [1]: from more_itertools import flatten
...: 
...: def generator():
...:     for i in range(10):
...:         yield i
...: 
...: def postprocess1(i):
...:     yield 2*i
...: 
...: def postprocess1_return(i):
...:     return 2*i
...: 
...: def postprocess2(i):
...:     yield from (i, 2*i)
...: 
In [2]: list(generator())
...: 
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: list(map(postprocess1, generator()))
...: 
Out[3]: 
[<generator object postprocess1 at 0x7f5a402916d0>,
<generator object postprocess1 at 0x7f5a40291e40>,
<generator object postprocess1 at 0x7f5a40291f20>,
<generator object postprocess1 at 0x7f5a40291dd0>,
<generator object postprocess1 at 0x7f5a40291eb0>,
<generator object postprocess1 at 0x7f5a40209040>,
<generator object postprocess1 at 0x7f5a40209190>,
<generator object postprocess1 at 0x7f5a402092e0>,
<generator object postprocess1 at 0x7f5a402090b0>,
<generator object postprocess1 at 0x7f5a40209350>]
In [4]: list(map(postprocess1_return, generator()))
...: 
Out[4]: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In [5]: list(map(postprocess2, generator()))
...: 
Out[5]: 
[<generator object postprocess2 at 0x7f5a403ad430>,
<generator object postprocess2 at 0x7f5a40209580>,
<generator object postprocess2 at 0x7f5a402097b0>,
<generator object postprocess2 at 0x7f5a40209510>,
<generator object postprocess2 at 0x7f5a40209430>,
<generator object postprocess2 at 0x7f5a40209740>,
<generator object postprocess2 at 0x7f5a402096d0>,
<generator object postprocess2 at 0x7f5a40209820>,
<generator object postprocess2 at 0x7f5a40209660>,
<generator object postprocess2 at 0x7f5a40209890>]
In [6]: list(flatten(generator()))
...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7cd770547fa4> in <module>
----> 1 list(flatten(generator()))
TypeError: 'int' object is not iterable
In [7]: list(flatten(map(postprocess1, generator())))
...: 
Out[7]: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In [8]: list(flatten(map(postprocess1_return, generator())))
...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-35ce9aef7285> in <module>
----> 1 list(flatten(map(postprocess1_return, generator())))
TypeError: 'int' object is not iterable
In [9]: list(flatten(map(postprocess2, generator())))
Out[9]: [0, 0, 1, 2, 2, 4, 3, 6, 4, 8, 5, 10, 6, 12, 7, 14, 8, 16, 9, 18]

我发现了:more_itertools.collapse(generator, base_type=pd.Series)做到了!

很明显,基值的类型实际上很重要:在我的实际代码中,如果没有base_type=pd.Series,a系列的所有元素都会一个接一个地生成,这不是我想要的。

最新更新