在Python中通过生成器组合字符

我试图获得长度为(8)的字符串的替换(换句话说，笛卡尔积)的所有组合。输出太大，我在一次将它存储在内存中有困难，所以我的进程在完成之前就被终止了。

这是我使用的代码，它比Python的stdlib itertools快得多:

import numpy as np
def cartesian(arrays, out=None):
"""Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
    1-D arrays to form the cartesian product of.
out : ndarray
    Array to place the cartesian product in.
Returns
-------
out : ndarray
    2-D array of shape (M, len(arrays)) containing cartesian products
    formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
shape = (len(x) for x in arrays)
dtype = arrays[0].dtype
ix = np.indices(shape)
ix = ix.reshape(len(arrays), -1).T
if out is None:
    out = np.empty_like(ix, dtype=dtype)
for n, arr in enumerate(arrays):
    out[:, n] = arrays[n][ix[:, n]]
return out

我如何让它从结果中返回一个生成器，而不是一次将所有内容存储到内存中?

我从其他问题中得到的印象是，product是迭代生成笛卡尔组合的最快方法:

In [494]: g=itertools.product([1,2,3],[4,5],[6,7])
In [495]: list(g)
Out[495]: 
[(1, 4, 6),
 (1, 4, 7),
 (1, 5, 6),
 (1, 5, 7),
 (2, 4, 6),
 (2, 4, 7),
 (2, 5, 6),
 (2, 5, 7),
 (3, 4, 6),
 (3, 4, 7),
 (3, 5, 6),
 (3, 5, 7)]

你的代码是np.indices的映射，这是较慢的:

In [499]: timeit np.indices((3,2,2)).reshape(3,-1).T
The slowest run took 11.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 61.6 µs per loop
In [500]: timeit list(itertools.product([1,2,3],[4,5],[6,7]))
100000 loops, best of 3: 3.51 µs per loop

相关内容

最新更新

热门标签：