将元组的np.ndarray(dtype=object)转换为dtype=int的数组



我需要将元组的np数组(短(转换为int的np数组。

最明显的方法不起作用:

# array_of_tuples is given, this is just an example:
array_of_tuples = np.zeros(2, dtype=object)
array_of_tuples[0] = 1,2
array_of_tuples[1] = 2,3
np.array(array_of_tuples, dtype=int)
ValueError: setting an array element with a sequence.

看起来,将元组放入预先分配的固定大小和dtype的缓冲区是可行的。它似乎避免了与计算大小、碎片和数据类型相关的大量开销。

以下是一些较慢的替代方案和基准:

  • 您可以欺骗并创建一个具有必要数量字段的数据类型,因为numpy支持将元组转换为自定义数据类型:

    dt = np.dtype([('', int) for _ in range(len(array_of_tuples[0]))])
    res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
    res.view(dt).ravel()[:] = array_of_tuples
    
  • 您可以堆叠阵列:

    np.stack(array_of_tuples, axis=0)
    

    不幸的是,这比其他提出的方法还要慢。

  • 预分配没有多大帮助:

    res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
    np.stack(array_of_tuples, out=res, axis=0)
    
  • 尝试使用允许指定输出数据类型的np.concatenate作弊也没有多大帮助:

    np.concatenate(array_of_tuples, dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))
    
  • 预分配阵列也没有:

    res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
    np.concatenate(array_of_tuples, out=res.ravel())
    
  • 您也可以尝试在python空间中进行串联,这也很慢:

    np.array(sum(array_of_tuples, start=()), dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))
    

    np.reshape(np.sum(array_of_tuples), (len(array_of_tuples), len(array_of_tuples[0])))
    
array_of_tuples = np.empty(100, dtype=object)
for i in range(len(array_of_tuples)):
array_of_tuples[i] = tuple(range(i, i + 100))
%%timeit
res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
for i, res[i] in enumerate(array_of_tuples):
pass
305 µs ± 8.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
dt = np.dtype([('', 'int',) for _ in range(100)])
%%timeit
res = np.empty((100, 100), int)
res.view(dt).ravel()[:] = array_of_tuples
334 µs ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.array(array_of_tuples.tolist())
478 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
res = np.empty((100, 100), int)
np.concatenate(array_of_tuples, out=res.ravel())
500 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.concatenate(array_of_tuples, dtype=int).reshape(100, 100)
504 µs ± 7.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
res = np.empty((100, 100), int)
np.stack(array_of_tuples, out=res, axis=0)
557 µs ± 25.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.stack(array_of_tuples, axis=0)
577 µs ± 6.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.array(sum(array_of_tuples, start=()), dtype=int).reshape(100, 100)
1.06 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.reshape(np.sum(array_of_tuples), (100, 100))
1.26 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

最新更新