有效地重新排列 2D 数字派阵列



假设我有一个2D NumPy数组:

x = np.random.rand(100, 100000)

我检索按列排序的索引(即,每一列都独立于其他列进行排序并返回索引(:

idx = np.argsort(x, axis=0) 

然后,对于每一列,我需要索引 = [10, 20, 30, 40, 50] 中的值首先是(该列的前 5 行(,然后是其余的排序值(不是索引!

一种幼稚的方法可能是:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)
for col in range(x.shape[1]):
# For each column, fill the first few rows with `indices`
out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices
# Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
n = indices.shape[0]
for row in range(indices.shape[0], x.shape[0]):
if idx[row, col] not in indices:
out[n, col] = x[row, col]  # Again, note that we want the value, not the index
n += 1

方法 #1

这是一个基于不需要idxprevious post-

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

方法#2

另一个使用idxnp.isin掩蔽 -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

方法#2-替代方案如果您不断编辑out以更改除indices之外的所有内容,则数组分配可能适合您 -

n = len(indices)
out[:n] = x[indices]
mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower

这应该可以帮助您通过消除最内部的循环和if条件来开始。首先,您可以将x[:, col]作为输入参数传入x

def custom_ordering(x, idx, indices):
# First get only the desired indices at the top
out = x[indices, :]
# delete `indices` from `idx` so `idx` doesn't have the values in `indices`
idx2 = np.delete(idx, indices)
# select `idx2` rows and concatenate
out = np.concatenate((out, x[idx2, :]), axis=0)
return out

这是我对这个问题的解决方案:

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)
out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

告诉我我是否正确理解了你的问题。

我使用较小的数组和较少的索引来执行此操作,以便我可以轻松地理智地检查结果,但它应该转换为您的用例。我认为这个解决方案相当有效,因为一切都已经到位。

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])
# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0) 

这是输出:

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
[7, 4, 6],
[6, 5, 2],
[6, 8, 4],
[2, 0, 2],
[3, 0, 4],  # 5th row
[4, 7, 4],
[3, 1, 1],  # 7th row
[3, 5, 3],
[0, 5, 9]]) # 9th row
>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
[3, 1, 1],  # Top of array looks like above
[0, 5, 9],  #
[6, 8, 4],
[2, 0, 2],
[7, 1, 8],  # Previous top row
[4, 7, 4],
[7, 4, 6],  # Previous second row
[3, 5, 3],
[6, 5, 2]]) # Previous third row
>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
[3, 1, 1], # Top is the same, below is sorted
[0, 5, 9], #
[2, 0, 2],
[3, 1, 2],
[4, 4, 3],
[6, 5, 4],
[6, 5, 4],
[7, 7, 6],
[7, 8, 8]])

编辑: 如果indices中的任何元素小于len(indices),则此处的此版本应处理

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])
tmp = x[indices]
# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1
# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

最新更新