数字广播困难

我有两个二维点云(oldPts 和 newPts)，我想将它们组合起来。它们是 mx2 和 nx2 numpyinteger 数组，m 和 n 的顺序为 2000。newPts 包含许多重复或接近重复的旧 Pts，我需要在合并之前删除它们。

到目前为止，我已经使用直方图2d函数来生成oldPts(H)的2d表示。然后，我将每个 newPt 与 H 的 NxN 区域进行比较，如果它是空的，我接受该点。我目前正在使用我想删除的 python 循环进行的最后一部分。谁能告诉我如何通过广播做到这一点，或者建议一种完全不同的方法来解决问题。工作代码如下

npzfile = np.load(path+datasetNo+'\temp.npz')
arrs = npzfile.files
oldPts = npzfile[arrs[0]]
newPts = npzfile[arrs[1]]
# remove all the negative values 
oldPts = oldPts[oldPts.min(axis=1)>=0,:]
newPts = newPts[newPts.min(axis=1)>=0,:]
# round to integers
oldPts = np.around(oldPts).astype(int)
newPts = newPts.astype(int)
# put the oldPts into 2d array
H, xedg,yedg= np.histogram2d(oldPts[:,0],oldPts[:,1],
bins = [xMax,yMax], 
range = [[0, xMax], [0, yMax]])
finalNewList = []
N = 5
for pt in newPts:
if not H[max(0,pt[0]-N):min(xMax,pt[0]+N),
max(0,pt[1]- N):min(yMax,pt[1]+N)].any():
finalNewList.append(pt)
finalNew = np.array(finalNewList)

正确的方法是使用线性代数来计算每对 2 长向量之间的距离，然后只接受与每个旧点"足够不同"的新点：使用scipy.spatial.distance.cdist：

import numpy as np
oldPts = np.random.randn(1000,2)
newPts = np.random.randn(2000,2)
from scipy.spatial.distance import cdist
dist = cdist(oldPts, newPts)
print(dist.shape) # (1000, 2000)
okIndex = np.max(dist, axis=0) > 5
print(np.sum(okIndex)) # prints 1503 for me
finalNew = newPts[okIndex,:]
print(finalNew.shape) # (1503, 2)

上面我使用欧几里得距离 5 作为"太近"的阈值：newPts中距离oldPts中所有点的距离超过 5 的任何点都被接受到finalPts中。您必须查看dist中的值范围才能找到一个好的阈值，但直方图可以指导您选择最佳阈值。

(可视化dist的一个好方法是使用matplotlib.pyplot.imshow(dist)。

这是您使用直方图所执行操作的更精细版本。事实上，假设直方图箱宽度在两个维度上相同，并再次使用 5 作为阈值，您应该能够通过metric='minkowski', p=1关键字参数传递给cdist来获得与直方图完全相同的答案。

(附言。如果您对scipy.spatial.distance中另一个有用的函数感兴趣，请查看我的答案，该答案使用pdist在数组中查找唯一的行/列。

相关内容

最新更新

热门标签：