我有一个大小为2, 1403
的2D数组a
和一个有2个列表的列表b
。
a.shape = (2, 1403) # a is 2D array, each row has got unique elements.
len(b) = 2 # b is list
len(b[0]), len(b[1]) = 415, 452 # here also both the list inside b has got unique elements
b[0] and b[1]
中存在的所有元素分别存在于a[0] and a[1]
中
现在我想在b
的元素的基础上重新排列a
的元素。我想重新排列,使得b[0]
中的所有元素(也存在于a[0]
中(都应该出现在a[0]
的末尾,这意味着新的a
应该是a[0][:-len(b[0])] = b[0]
,类似于a[1][:-len(b[1])] = b[1]
。
玩具示例
a
具有类似[[1,2,3,4,5,6,7,8,9,10,11,12],[1,2,3,4,5,6,7,8,9,10,11,12]
的元素
b
具有类似[[5, 9, 10], [2, 6, 8, 9, 11]]
的元素
new_a
变为[[1,2,3,4,6,7,8,11,12,5,9,10], [1,3,4,5,7,10,12,2,6,8,9,11]]
我写了一个代码,它在所有元素上循环,变得非常慢,如下面所示
a_temp = []
remove_temp = []
for i, array in enumerate(a):
a_temp_inner = []
remove_temp_inner = []
for element in array:
if element not in b[i]:
a_temp_inner.append(element) # get all elements first which are not present in b
else:
remove_temp_inner.append(element) #if any element present in b, remove it from main array
a_temp.append(a_temp_inner)
remove_temp.append(b_temp_inner)
a_temp = torch.tensor(a_temp)
remove_temp = torch.tensor(remove_temp)
a = torch.cat((a_temp, remove_temp), dim = 1)
有人能帮我做一些比这个更好的更快的实现吗
假设a是np.array
,b为list
,则可以使用
np.array([np.concatenate((i[~np.in1d(i, j)], j)) for i, j in zip(a,b)])
输出
array([[ 1, 2, 3, 4, 6, 7, 8, 11, 12, 5, 9, 10],
[ 1, 3, 4, 5, 7, 10, 12, 2, 6, 8, 9, 11]])
如果b包含空lists
,则可以进行微优化
np.array([np.concatenate((i[~np.in1d(i, j)], j)) if j else i for i, j in zip(a,b)])
在我的基准测试中,对于少于~100个元素的np.arrays
,转换.tolist()
比np.concatenate
更快
np.array([i[~np.in1d(i, j)].tolist() + j for i, j in zip(a,b)])
此解决方案的数据示例和导入
import numpy as np
a = np.array([
[1,2,3,4,5,6,7,8,9,10,11,12],
[1,2,3,4,5,6,7,8,9,10,11,12]
])
b = [[5, 9, 10],
[2, 6, 8, 9, 11]]
这是我的方法:
index_ = np.array([[False if i in d else True for i in c] for c, d in zip(a,b)])
arr_filtered =[[np.extract(ind, c) for c, d, ind in zip(a,b,index_)], [np.extract(np.logical_not(ind), c) for c, d, ind in zip(a,b, index_)]]
arr_final = ar = np.array([np.concatenate((i, j)) for i, j in zip(*arr_filtered)])