通过使用 numpy 唯一计数时避免 python 循环来提高性能

我有两个 numpy 数组，A个形状为(N,3)和 B 的形状(N,)，我从向量 A 生成具有唯一条目的向量，例如：

A = np.array([[1.,2.,3.],
[4.,5.,6.],
[1.,2.,3.],
[7.,8.,9.]])
B = np.array([10.,33.,15.,17.])
AUnique, directInd, inverseInd, counts = np.unique(A, 
return_index = True, 
return_inverse = True, 
return_counts = True, 
axis = 0)

这样AUnique就会array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

然后我得到与AUnique相关的 simil-vectorB，对于A中的每个非唯一行，我对该向量中B的相关值求和，即：

BNew = B[directInd] 
# here BNew is [10., 33.,17]
for Id in np.asarray(counts>1).nonzero()[0]: 
BNew[Id] = np.sum(B[inverseInd == Id])
# here BNew is [25., 33.,17]

问题是对于大 N 向量(数百万或数千万行(来说，for 循环变得非常慢，我想知道是否有办法避免循环和/或使代码更快。

提前感谢！

我认为你可以用np.bincount做你想做的事

BNew = np.bincount(inverseInd, weights = B)
BNew
Out[]: array([25., 33., 17.])

相关内容

最新更新

热门标签：