快速实现查找重复坐标

我正试图编写一个程序，在三维数组中查找重复坐标(x, y, z(。脚本应该以给定的公差标记一个或多个重复点——一个点可以有多个重复。我发现了很多不同的方法，其中包括使用排序方法。

为了尝试代码，我创建了以下测试数据集：

21.9799629872016 57.4044376777929 0
22.7807110172432 57.6921361034533 0
28.660840151287 61.5676757599822 0
28.6608401512 61.56767575998 0
30.6654296288019 56.2221038199424 0
20.3752036442253 49.1392209993897 0
32.8036584048178 43.927288357851 0
35.8105426210901 51.9456462679106 0
40.8888359641279 58.6944308422108 0
40.88883596412 70.6944308422108 0
41.0892949118794 58.1598736482068 0
39.6860822776189 64.775018924006 0
39.1515250836149 64.8418385732565 0
8.21402748063493 63.5054455882466 0
8.2140275006 63.5074455882 0
8.21404548063493 63.5064455882466 0
8.2143214806 63.5084455882 0

我想出的代码是：

# given tolerance
tol = 0.01
# initialize empty list for the found duplicates
duplicates = []
# loop over all nodes
for i in range(0,len(nodes)):
# current node
curr_node = nodes[i]
# create difference vector
diff = nodes - curr_node

# get all duplicate indices (the node itself is found as well)
condition = np.where((abs(diff[:,0])<tol) & (abs(diff[:,1])<tol) & (abs(diff[:,2])<tol))
# check if more than one entry is present. If larger than 1, duplicate points exist
if len(condition[0]) > 1:
# loop over all found duplicate points
for j in range(0,len(condition[0])):
# add duplicate if not already marked as duplicate
if j>0 and condition[0][j] not in duplicates:
duplicates.append(condition[0][j] )

此代码返回我所期望的：

duplicates = [3, 14, 15, 16]

但是，代码非常慢。对于300000分，大约需要10分钟。我想知道是否有更快的方法来实现这一点。

您可以在tolerance大小的立方体网格中放置点。然后，对于每个点，您只需要检查来自同一立方体的点+26个相邻的点，而不是所有其他点。

# compute the grid
for p in points:
cube = (
int(p[0] / tolerance),
int(p[1] / tolerance),
int(p[2] / tolerance))
grid[cube].append(p)
# check
for p in points:
cube = as above
for adj in adjacent_cubes(cube)
for p2 in grid[adj]
check_distance(p, p2)

您可以提前对节点进行排序，以减少所需的循环数量：

import timeit
import random
nodes = [
[21.9799629872016, 57.4044376777929, 0],
[22.7807110172432, 57.6921361034533, 0],
[28.660840151287, 61.5676757599822, 0], [28.6608401512, 61.56767575998, 0],
[30.6654296288019, 56.2221038199424, 0],
[20.3752036442253, 49.1392209993897, 0],
[32.8036584048178, 43.927288357851, 0],
[35.8105426210901, 51.9456462679106, 0],
[40.8888359641279, 58.6944308422108, 0],
[40.88883596412, 70.6944308422108, 0],
[41.0892949118794, 58.1598736482068, 0],
[39.6860822776189, 64.775018924006, 0],
[39.1515250836149, 64.8418385732565, 0],
[8.21402748063493, 63.5054455882466, 0], [8.2140275006, 63.5074455882, 0],
[8.21404548063493, 63.5064455882466, 0], [8.2143214806, 63.5084455882, 0]
]
duplicates = [3, 14, 15, 16]
assertList = [n for i, n in enumerate(nodes) if i in duplicates]

def new(nodes, tol=0.01):
print(f"Searching duplicates in {len(nodes)} nodes")
coordinateLen = range(len(nodes[0]))
nodes.sort()
last = nodes[0]
duplicates = []
for i, node in enumerate(nodes[1:]):
if not all(0 <= node[idx] - last[idx] < tol for idx in coordinateLen):
last = node
else:
duplicates.append(node)
print(f"Found: {len(duplicates)} duplicates")
return duplicates

# generate random numbers!
randomNodes = [
[random.uniform(0, 100),
random.uniform(0, 100),
random.uniform(0, 1)] for _ in range(300000)
]
# make sure there are at least the same 4 duplicates!
randomNodes += nodes
for i, lst in enumerate((nodes, randomNodes)):
for func in ("new", ):
t1 = timeit.Timer(f"{func}({lst})", f"from __main__ import {func}")
# verify values of found duplicates are [3, 14, 15, 16] !!
if i == 0:
print(all(x for x in new(nodes) if x in assertList))
print(f"{func} took: {t1.timeit(number=10)} seconds")
print("")

输出：

Searching duplicates in 17 nodes
Found: 4 duplicates
True
....
new took: 0.00034904800000001845 seconds
Searching duplicates in 300017 nodes
Found: 4 duplicates
...
new took: 14.316181525000001 seconds

相关内容

最新更新

热门标签：