numpy.union，用于维护秩序

通过丢弃原始数组的随机值(具有唯一和未排序的元素(生成了两个数组：

orig = np.array([2, 1, 7, 5, 3, 8])

假设这些阵列是：

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3, 8])

只给定这两个数组，我需要将它们合并，以便丢弃的值位于它们的正确位置。

结果应该是：

result = np.array([2, 1, 7, 3, 8])

另一个例子：

a1 = np.array([2, 1, 7, 5,    8])
b1 = np.array([2,       5, 3, 8])
# the result should be: [2, 1, 7, 5, 3, 8]

编辑：

这个问题是模棱两可的，因为不清楚在这种情况下该怎么办：

a2 = np.array([2, 1, 7,       8])
b2 = np.array([2,       5, 3, 8])
# the result should be: ???

我在现实中拥有的+解决方案：

这些数组的元素是包含时间序列的两个数据帧的索引。我可以使用pandas.merge_ordered来实现我想要的有序索引。

我以前的尝试：

numpy.union1d不合适，因为它总是排序：

np.union1d(a, b)
# array([1, 2, 3, 7, 8]) - not what I want

也许熊猫可以帮忙？

这些方法完全使用第一个数组，然后附加第二个数组的剩余值：

pd.concat([pd.Series(index=a, dtype=int), pd.Series(index=b, dtype=int)], axis=1).index.to_numpy()
pd.Index(a).union(b, sort=False).to_numpy()  # jezrael's version
# array([2, 1, 7, 8, 3]) - not what I want

想法是用扁平连接两个数组，然后按顺序删除重复的数组：

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3, 8])
c = np.vstack((a, b)).ravel(order='F')
_, idx = np.unique(c, return_index=True)
c = c[np.sort(idx)]
print (c)
[2 1 7 3 8]

Pandas解决方案：

c = pd.DataFrame([a,b]).unstack().unique()
print (c)
[2 1 7 3 8]

如果值的数量不同：

a = np.array([2, 1, 7,    8])
b = np.array([2,    7, 3])
c = pd.DataFrame({'a':pd.Series(a), 'b':pd.Series(b)}).stack().astype(int).unique()
print (c)
[2 1 7 3 8]

编辑：

我在现实中拥有的+解决方案：

我以前的尝试：

相关内容

最新更新

热门标签：