我试图获得长度为(8)的字符串的替换(换句话说,笛卡尔积)的所有组合。输出太大,我在一次将它存储在内存中有困难,所以我的进程在完成之前就被终止了。
这是我使用的代码,它比Python的stdlib itertools快得多:
import numpy as np
def cartesian(arrays, out=None):
"""Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
shape = (len(x) for x in arrays)
dtype = arrays[0].dtype
ix = np.indices(shape)
ix = ix.reshape(len(arrays), -1).T
if out is None:
out = np.empty_like(ix, dtype=dtype)
for n, arr in enumerate(arrays):
out[:, n] = arrays[n][ix[:, n]]
return out
我如何让它从结果中返回一个生成器,而不是一次将所有内容存储到内存中?
我从其他问题中得到的印象是,product
是迭代生成笛卡尔组合的最快方法:
In [494]: g=itertools.product([1,2,3],[4,5],[6,7])
In [495]: list(g)
Out[495]:
[(1, 4, 6),
(1, 4, 7),
(1, 5, 6),
(1, 5, 7),
(2, 4, 6),
(2, 4, 7),
(2, 5, 6),
(2, 5, 7),
(3, 4, 6),
(3, 4, 7),
(3, 5, 6),
(3, 5, 7)]
你的代码是np.indices
的映射,这是较慢的:
In [499]: timeit np.indices((3,2,2)).reshape(3,-1).T
The slowest run took 11.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 61.6 µs per loop
In [500]: timeit list(itertools.product([1,2,3],[4,5],[6,7]))
100000 loops, best of 3: 3.51 µs per loop