有效地获取numpy中的索引子集的方法

我有以下索引，因为您会从 np.where(...)获取它们：

coords = (
  np.asarray([0 0 0 1 1 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 5 5 6 6 6]),
  np.asarray([2 2 8 2 2 4 4 6 2 2 6 2 2 4 6 2 2 6 2 2 4 4 6 2 2 6]),
  np.asarray([0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]),
  np.asarray([0 1 0 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1])
)

另一个带有指数的元组旨在选择coords中的元组：

index = tuple(
  np.asarray([0 0 1 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 5 5 6 6 6]),
  np.asarray([2 8 2 4 4 6 2 2 6 2 2 4 6 2 2 6 2 2 4 4 6 2 2 6]),
  np.asarray([0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]),
  np.asarray([0 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1])
)

因此，选择坐标[0]是因为它在索引中(位置为0(，但是没有选择coords[1]，因为它在index中不可用。

我可以使用[x in zip(*index) for x in zip(*coords)]轻松计算掩码(从Bool转换为INT以获得更好的可读性(：

[1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

，这对于较大的数组来说不是很有效。是否有一种可以计算掩码的"基于numpy"的方法？

对效率不太确定，但是鉴于您基本上比较了坐标对，您可以使用scipy距离功能。有些东西：

from scipy.spatial.distance import cdist
c = np.stack(coords).T
i = np.stack(index).T
d = cdist(c, i)
In [113]: np.any(d == 0, axis=1).astype(int)
Out[113]: 
array([1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1])

默认情况下，它使用L2 Norm，您可能会使用更简单的距离函数，例如：

使其更快地使其更快。

d = cdist(c,i, lambda u, v: np.all(np.equal(u,v)))
np.any(d != 0, axis=1).astype(int)

您可以使用np.ravel_multi_index将列压缩为易于处理的唯一数字：

cmx = *map(np.max, coords),
imx = *map(np.max, index),
shape = np.maximum(cmx, imx) + 1
ct = np.ravel_multi_index(coords, shape)
it = np.ravel_multi_index(index, shape)
it.sort()
result = ct == it[it.searchsorted(ct)]
print(result.view(np.int8))

打印：

[1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

相关内容

最新更新

热门标签：