我有5个numpy数组:
array_1 = [1,2,3]
array_2 = [4,5,6]
array_3 = [7,8,9]
array_4 = [10,11,12]
array_5 = [1,2,3]
我需要对它们进行比较-本质上,如果上面的5个数组中有任何一个具有相同的值(和索引(,我需要了解它
index_array_1 = np.where(array_1 == array_2)[0]
index_array_2 = np.where(array_1 == array_3)[0]
index_array_3 = np.where(array_1 == array_4)[0]
index_array_4 = np.where(array_1 == array_5)[0]
index_array_5 = np.where(array_2 == array_3)[0]
index_array_6 = np.where(array_2 == array_4)[0]
index_array_7 = np.where(array_2 == array_5)[0]
index_array_8 = np.where(array_3 == array_4)[0]
index_array_9 = np.where(array_3 == array_5)[0]
index_array_10 = np.where(array_4 == array_5)[0]
因此,在这种情况下,只有index_array_4会返回任何值,因为array_1和array_5匹配。但是,这显然不是最好的方法。这是大量的代码,运行起来也需要一段时间。
有没有什么我还没有遇到的东西,我基本上可以说";如果5个数组中有任何一个匹配,请告诉我,并让我知道哪两个数组匹配";?
我还希望它返回一个匹配数组的索引数组。
您可以尝试一个单行:
>>> from itertools import combinations
>>> [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2)
if np.all(np.equal(*map(globals().get,arrays)))]
输出:
[('array_1', 'array_5')]
解释:
>>> [f"array_{i}" for i in range(1,6)]
['array_1', 'array_2', 'array_3', 'array_4', 'array_5']
>>> list(combinations([f"array_{i}" for i in range(1,6)],2))
[('array_1', 'array_2'),
('array_1', 'array_3'),
('array_1', 'array_4'),
('array_1', 'array_5'),
('array_2', 'array_3'),
('array_2', 'array_4'),
('array_2', 'array_5'),
('array_3', 'array_4'),
('array_3', 'array_5'),
('array_4', 'array_5')]
现在它在组合中迭代,
如果我们采用第一个元素,即第一次迭代,那么剩下的步骤将看起来像:
>>> [*map(globals().get, ('array_1', 'array_2'))]
[[1, 2, 3], [4, 5, 6]]
>>> np.all(np.equal([1, 2, 3], [4, 5, 6]))
False
编辑:
如果在函数内部,请尝试:
def bar():
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
scope = locals()
return [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2)
if np.all(eval(arrays[0],scope) == eval(arrays[1],scope))]
您可以这样做:
import numpy as np
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compare all vs all
c = np.all(all_arrays[:, np.newaxis] == all_arrays, axis=-1)
# Take only half the result to avoid self results and symmetric results
c = np.triu(c, 1)
# Get matching pairs
m = np.stack(np.where(c), axis=1)
# One row per matching pair
print(m)
# [[0 4]]
然而,这进行了比必要的更多的比较(例如array_1
与array_2
以及array_2
与array_1
(。您也可以使用类似scipy.spatial.distance.pdist
的东西来潜在地节省一些时间:
import numpy as np
import scipy.spatial.distance
array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]
# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compute pairwise distances
d = scipy.spatial.distance.pdist(all_arrays, 'hamming')
d = scipy.spatial.distance.squareform(d)
# Get indices of pairs where it is zero
c = np.triu(d == 0, 1)
m = np.stack(np.where(c), axis=1)
print(m)
# [[0 4]]
您可以使用.count()
方法来验证数组中是否存在数组的多个实例:
def compare(*arrays):
temp = [list(x) for x in list(arrays)]
for i in range(len(temp)):
if temp.count(temp[i]) > 1:
return (i,temp[i + 1:].index(temp[i]) + 1)
else:
return False
该函数的第一行生成一个列表,其中包含所有使用的数组,就像强制转换为列表类型的参数一样。如果列表中有多个i
(实际迭代值(,将返回i
和另一个标识数组的索引。函数需要在没有实际i
的列表范围内,使用方法.index()
返回另一个标识数组的索引。
print(compare(array_1,array_2,array_3,array_4,array_5))
将返回
(0, 4)