通过列表推导式收集np数组的相同行



我有一个2d np数组,希望通过列表推导式收集相同的行。我的实现返回所需的结果,可以在这里找到更好的解决方案:

import numpy as np

A = np.array([ 
[1,1,1,0,0,0],  #sample input data
[0,0,1,0,1,1],
[0,0,1,0,1,1],
[1,1,1,0,0,0],
[0,0,1,0,1,1],
[1,0,0,0,0,0],
[1,0,1,0,0,0],
[1,0,0,0,1,0],
[1,0,0,0,0,0]   
])


def gr_id_rows(Matrix):  #returns list of lists of identical row indices
m = Matrix.shape[0]
M = Matrix
indices = list(range(m))
lst_of_lsts_ident = []
while len(M) > 0:
lst_ident = []
row_0 = M[0,:]
M = np.delete(M, 0, 0)
lst_ident.append(indices.pop(0))

k = 0
for row in M:
if np.array_equal(row, row_0):
M = np.delete(M, k, 0)
lst_ident.append(indices.pop(k))
else:
k += 1
lst_of_lsts_ident.append(lst_ident)    
return lst_of_lsts_ident

#execution
print( gr_id_rows(A) )   #[[0, 3], [1, 2, 4], [5, 8], [6], [7]] 

关于真实数据集的说明:

  • 只有二进制文件
  • 大小可以达到1000 x 700左右,但大多数情况下是60 x 40左右。

我们可以通过列表推导更优雅地做到这一点吗?

我做了一个(明显)产生错误结果的尝试。

nbr_rows = A.shape[0]
col_ind = range(A.shape[0])
ind_eq = [[k for k in col_ind if np.array_equal(A[k,:], A[h,:]) and k != h] for h in col_ind] 

print(ind_eq) #[[3], [2, 4], [1, 4], [0], [1, 2], [8], [], [], [5]]

这是一个使用numpy.equal对a与自身(广播),itertools.groupby重塑输出的解决方案:

from itertools import groupby
a,b = np.equal(A, A[:,None]).all(2).nonzero()
{tuple(b[i] for i in g) for i,g in groupby(range(len(a)), lambda i:a[i])}

输出:

{(0, 3), (1, 2, 4), (5, 8), (6,), (7,)}

最新更新