在Numpy Python中格式化两个数组的排列顺序



我正在尝试按order的顺序订购zeroesones。期望的输出是我想要得到的。最好不要使用列表推导式。

import numpy as np
order = np.array([0,1,0,1,0])
zeroes= np.array([10,55, 30])
ones = np.array([3,8])

预期输出

[10, 3, 55, 8, 30] 

这个怎么样(没有Python循环:在测试200k个元素时,比列表推导快750倍):

# note: updated version: faster and more robust to faulty input
def altcat(zeroes, ones, order):
i0 = np.nonzero(order == 0)[0][:len(zeroes)]
i1 = np.nonzero(order == 1)[0][:len(ones)]

z = np.zeros_like(order, dtype=zeroes.dtype)
z[i0] = zeroes[:len(i0)]
z[i1] = ones[:len(i1)]
return z

以你为例:

>>> altcat(zeroes=np.array([10,55, 30]), ones=np.array([3,8]),
...        order=np.array([0,1,0,1,0]))
array([10,  3, 55,  8, 30])

# set up
n = 200_000
np.random.seed(0)
order = np.random.randint(0, 2, size=n)
n1 = order.sum()
n0 = n - n1
ones = np.random.randint(100, size=n1)
zeroes = np.random.randint(100, size=n0)
# for comparison, a method proposed elsewhere, based on lists
def altcat_list(zeroes, ones, order):
zeroes = list(zeroes)
ones = list(ones)
return [zeroes.pop(0) if i == 0 else ones.pop(0) for i in order]

测试:

a = %timeit -o altcat(zeroes, ones, order)
# 2.38 ms ± 573 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
b = %timeit -o altcat_list(zeroes, ones, order)
# 1.84 s ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
b.average / a.average
# 773.59

注意:我最初尝试使用n = 1_000_000,但是altcat在12.4ms内完成,基于列表的版本将永远需要,我不得不停止它。

基于列表的方法似乎比O(n)(100K: 0.4s;200 k: 1.84秒;400 k: 10.4 s)。

附录

如果你真的想用列表推导而不是纯numpy来做,那么至少考虑一下:

def altcat_list_mod(zeroes, ones, order):
it = [iter(zeroes), iter(ones)]
return [next(it[i]) for i in order]

这比altcat_list()快,但仍然比altcat()慢近25倍:

# on 200k elements
c = %timeit -o altcat_list_mod(zeroes, ones, order)
# 60 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
c.average / a.average
# 24.93

最新更新