给定一些 numpy 数组a
array([2,2,3,3,2,0,0,0,2,2,3,2,0,1,1,0])
获取所有n
指数组的最佳方法是什么,每个指数组在a
中都有不同的值?
显然,没有比a
中唯一元素数大的群,这里为4。
例如,一组大小为 4 的组是
array([0,2,5,13])
考虑到a
可能很长,比如说高达 250k。
如果结果变得太大,则可能还希望不要计算所有此类组,而只计算请求的第一个k
。
对于作为整数的输入,我们可以有一个基于this post
-
In [41]: sidx = a.argsort() # use kind='mergesort' for first occurences
In [42]: c = np.bincount(a)
In [43]: np.sort(sidx[np.r_[0,(c[c!=0])[:-1].cumsum()]])
Out[43]: array([ 0, 2, 5, 13])
另一个与以前的通用输入方法密切相关 -
In [44]: b = a[sidx]
In [45]: np.sort(sidx[np.r_[True,b[:-1]!=b[1:]]])
Out[45]: array([ 0, 2, 5, 13])
另一个具有内存效率和性能numba
,可以沿着这些唯一组选择第一个索引,并且还带有额外的k
参数 -
from numba import njit
@njit
def _numba1(a, notfound, out, k):
iterID = 0
for i,e in enumerate(a):
if notfound[e]:
notfound[e] = False
out[iterID] = i
iterID += 1
if iterID>=k:
break
return out
def unique_elems(a, k, maxnum=None):
# feed in max of the input array as maxnum value if known
if maxnum is None:
L = a.max()+1
else:
L = maxnum+1
notfound = np.ones(L, dtype=bool)
out = np.ones(k, dtype=a.dtype)
return _numba1(a, notfound, out, k)
示例运行 -
In [16]: np.random.seed(0)
...: a = np.random.randint(0,10,200)
In [17]: a
Out[17]:
array([5, 0, 3, 3, 7, 9, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 9,
8, 9, 4, 3, 0, 3, 5, 0, 2, 3, 8, 1, 3, 3, 3, 7, 0, 1, 9, 9, 0, 4,
7, 3, 2, 7, 2, 0, 0, 4, 5, 5, 6, 8, 4, 1, 4, 9, 8, 1, 1, 7, 9, 9,
3, 6, 7, 2, 0, 3, 5, 9, 4, 4, 6, 4, 4, 3, 4, 4, 8, 4, 3, 7, 5, 5,
0, 1, 5, 9, 3, 0, 5, 0, 1, 2, 4, 2, 0, 3, 2, 0, 7, 5, 9, 0, 2, 7,
2, 9, 2, 3, 3, 2, 3, 4, 1, 2, 9, 1, 4, 6, 8, 2, 3, 0, 0, 6, 0, 6,
3, 3, 8, 8, 8, 2, 3, 2, 0, 8, 8, 3, 8, 2, 8, 4, 3, 0, 4, 3, 6, 9,
8, 0, 8, 5, 9, 0, 9, 6, 5, 3, 1, 8, 0, 4, 9, 6, 5, 7, 8, 8, 9, 2,
8, 6, 6, 9, 1, 6, 8, 8, 3, 2, 3, 6, 3, 6, 5, 7, 0, 8, 4, 6, 5, 8,
2, 3])
In [19]: unique_elems(a, k=6)
Out[19]: array([0, 1, 2, 4, 5, 8])
为此作业使用 Numpy.unique。还有其他几个选项,例如可以返回每个唯一项在 a 中出现的次数。
import numpy as np
# Sample data
a = np.array([2,2,3,3,2,0,0,0,2,2,3,2,0,1,1,0])
# The unique values are in 'u'
# The indices of the first occurence of the unique values are in 'indices'
u, indices = np.unique(a, return_index=True)