索引数组numpy矩阵的条件广播赋值



我有一个numpy矩阵,其中填充了一些值(我使用零和两个翻转,使示例易于呈现具有两个条件的示例):

nparray = array([[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., (2, 2.5), 0., 0., 0., 0., 0., 0., 0., 0.],
[0., (1, 6.5), 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.]])

我有一个子矩阵它是一些计算的结果需要分配到nparray中的某个特定索引位置:

sub_array= array([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
[(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
[(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
[(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)
index = [1, 2, 5, 9]

我需要分配sub_array中的值到nparray的结果索引的位置只有当nparray中的值不是元组,或者sub_array中元组的第二项的值小于nparray中元组的第二个值在同一位置,导致类似(我在顶部添加索引以使赋值位置清晰):

--------Index-----  0 | 1       | 2        | 3 | 4 | 5        | 6 | 7 | 8 | 9        
nparray =   array([[0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
|       1     [0., (2, 2.5), (2, 3.2) , 0., 0., (3, 4.6) , 0., 0., 0., (4, 3.4)],
|       2     [0., (3, 4.5), (4, 0.4) , 0., 0., (5, 3.2) , 0., 0., 0., (6, 2.3)],
i       3     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
n       4     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
d       5     [0., (3, 4.5), (5, 2.3) , 0., 0., (7, 5.3) , 0., 0., 0., (9, 2.3)],
e       6     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
x       7     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
|       8     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
|       9     [0.,(12, 3.2), (45, 2.4), 0., 0., (32, 2.3), 0., 0., 0., (6, 5.4)]])
你可以看到sub_array在所有索引

的位置赋值。数组的组合。对于位置(1,1)的元组,该值不被替换,因为nparray中的第二个项值(2.5)小于sub_array中的第二项值(3.2),另一方面,位置(2,1)的元组被替换,因为nparray中的第二个项值(6.5)高于sub_array中的第二项值(4.5)

我如何用NumPy实现这个条件赋值,以确保时间效率,而不是通过循环?

Pd:我的主要目标是计算一个基于一些先验过滤的距离矩阵,我的数据集有110K,如果我运行整个集合而不是其中的一个子集,它将需要半年的时间来完成计算。提前感谢!

下面是基本的索引赋值:

In [60]: index = np.array([1,2,6,7]); data = np.arange(16).reshape(4,4)
In [62]: res = np.zeros((10,10),int)

我们可以选择一个(4,4)的值块:

In [63]: res[index[:,None],index]
Out[63]: 
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])

,并将(4,4)data赋值给它:

In [64]: res[index[:,None],index] = data
In [65]: res
Out[65]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
[ 0,  0,  1,  0,  0,  0,  2,  3,  0,  0],
[ 0,  4,  5,  0,  0,  0,  6,  7,  0,  0],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
[ 0,  8,  9,  0,  0,  0, 10, 11,  0,  0],
[ 0, 12, 13,  0,  0,  0, 14, 15,  0,  0],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

你的sub_array到底是什么?如果我只是复制-n-粘贴,我得到一个(4,4,2)数组:

In [67]: sub_array= np.array([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)
In [68]: sub_array.shape
Out[68]: (4, 4, 2)

不能赋值给res

可以创建一个包含元组元素的(4,4)数组:

In [69]: sub_array= np.empty((4,4),object) 
...: sub_array[:] = [[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]]
In [70]: sub_array
Out[70]: 
array([[(1, 3.2), (2, 3.2), (3, 4.6), (4, 3.4)],
[(3, 4.5), (4, 0.4), (5, 3.2), (6, 2.3)],
[(3, 4.5), (5, 2.3), (7, 5.3), (9, 2.3)],
[(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)

并将值赋给另一个对象类型数组:

In [71]: res = np.zeros((10,10),object)
In [73]: res[index[:,None],index] = sub_array
In [74]: res
Out[74]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, (1, 3.2), (2, 3.2), 0, 0, 0, (3, 4.6), (4, 3.4), 0, 0],
[0, (3, 4.5), (4, 0.4), 0, 0, 0, (5, 3.2), (6, 2.3), 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, (3, 4.5), (5, 2.3), 0, 0, 0, (7, 5.3), (9, 2.3), 0, 0],
[0, (12, 3.2), (45, 2.4), 0, 0, 0, (32, 2.3), (6, 5.4), 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=object)

实际上我可以从嵌套的元组列表开始,并跳过sub_array:

In [75]: res = np.empty((10,10),object)
In [76]: alist = [[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]]
In [77]: res[index[:,None],index] = alist
In [78]: res
Out[78]: 
array([[None, None, None, None, None, None, None, None, None, None],
[None, (1, 3.2), (2, 3.2), None, None, None, (3, 4.6), (4, 3.4),
None, None],
[None, (3, 4.5), (4, 0.4), None, None, None, (5, 3.2), (6, 2.3),
None, None],
[None, None, None, None, None, None, None, None, None, None],
[None, None, None, None, None, None, None, None, None, None],
[None, None, None, None, None, None, None, None, None, None],
[None, (3, 4.5), (5, 2.3), None, None, None, (7, 5.3), (9, 2.3),
None, None],
[None, (12, 3.2), (45, 2.4), None, None, None, (32, 2.3),
(6, 5.4), None, None],
[None, None, None, None, None, None, None, None, None, None],
[None, None, None, None, None, None, None, None, None, None]],
dtype=object)

另一种选择是从(10,10,2)数字res开始,然后复制(4,4,2)data到它。

编辑:刚刚注意到您在给定索引处已经有一个元组的情况下要采用哪个元组的附加约束。我要出门了,但也许这足以让OP从这里开始。

我相信这得到了你想要的,通过一些相当简单的索引:

In [34]: arr = np.array([[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., (2, 2.5), 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., (1, 6.5), 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=object)
In [35]: arr
Out[35]:
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, (2, 2.5), 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, (1, 6.5), 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], dtype=object)
In [36]: idx = np.ix_([1, 2, 5, 9], [1, 2, 5, 9])
In [37]: sub = np.asarray([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], 'float,float')
In [38]: arr[idx] = np.where(arr[idx], arr[idx], sub)
In [39]: arr
Out[39]:
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, (2, 2.5), (2.0, 3.2), 0.0, 0.0, (3.0, 4.6), 0.0, 0.0, 0.0,
(4.0, 3.4)],
[0.0, (1, 6.5), (4.0, 0.4), 0.0, 0.0, (5.0, 3.2), 0.0, 0.0, 0.0,
(6.0, 2.3)],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, (3.0, 4.5), (5.0, 2.3), 0.0, 0.0, (7.0, 5.3), 0.0, 0.0, 0.0,
(9.0, 2.3)],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, (12.0, 3.2), (45.0, 2.4), 0.0, 0.0, (32.0, 2.3), 0.0, 0.0,
0.0, (6.0, 5.4)]], dtype=object)

然而,我不得不问——为什么?!你为什么要这样存储你的数据?这完全违背了numpy的目的…