根据ID值删除/保留numpy数组行



我有两个numpy数组,每个数组在第0列中都有一个标识号。

在每个数组的标识号匹配的地方,我希望保留与这些标识号相关联的相应行。

如果在另一个数组中有一个ID没有匹配的ID,我希望删除与该ID号关联的行,只有在ID号出现的数组中。

这两个数组都是按其ID号排序的。

输入阵列a&b、 以及输出阵列c&d、 可以在下面找到-注意,数组的行数不相同(n.b.a&b的实际示例要大得多-(2487,12(&(分别为2482、12(

In:

a =
[[9.60977,  97.5,  96,    99,    100.5,  1.60]
[9.60978,  97.5,  96,    100.5, 102,    0.31]
[9.60979,  97.5,  96,    102,   103.5,  0.11]
[9.60980,  97.5,  96,    103.5, 105,    0.05]
[9.60981,  97.5,  96,    105,   106.5,  0.03]
[9.60983,  97.5,  96,    108,   109.5,  0.01]
[9.60984,  97.5,  96,    109.5, 111,    0.01]]
b = 
[[9.60977,  99,    100.5, 97.5,  96,     1.58]
[9.60979,  102,   103.5, 97.5,  96,     0.11]
[9.60980,  103.5, 105,   97.5,  96,     0.05] 
[9.60981,  105,   106.5, 97.5,  96,     0.03]
[9.60982,  106.5, 108,   97.5,  96,     0.02]
[9.60984,  109.5, 111,   97.5,  96,     0.01]]

输出:

c =
[[9.60977,  97.5,  96,    99,    100.5,  1.60]
[9.60979,  97.5,  96,    102,   103.5,  0.11]
[9.60980,  97.5,  96,    103.5, 105,    0.05]
[9.60981,  97.5,  96,    105,   106.5,  0.03]
[9.60984,  97.5,  96,    109.5, 111,    0.01]]
d = 
[[9.60977,  99,    100.5, 97.5,  96,     1.58]
[9.60979,  102,   103.5, 97.5,  96,     0.11]
[9.60980,  103.5, 105,   97.5,  96,     0.05] 
[9.60981,  105,   106.5, 97.5,  96,     0.03]
[9.60984,  109.5, 111,   97.5,  96,     0.01]]

我曾尝试在for循环中使用一对if语句,但这失败了,因为1(数组的长度不相同(请参阅下面的Traceback(,2(一旦值被删除,它就不会重新测试行

for i in np.arange(0, max(len(a), len(b)), 1):
if a[i, 0] > b[i, 0]:
a = np.delete(a, i, 0)
if a[i, 0] < b[i, 0]:
b = np.delete(b, i, 0)
Traceback (most recent call last):
File "<ipython-input-271-509fc93aea3b>", line 2, in <module>
if a[i, 0] > b[i, 0]:
IndexError: index 4 is out of bounds for axis 0 with size 3

我也尝试过while循环,但它删除了数组b 中所有错误的行

n = 0
s = max(len(a), len(b))
c = np.array(())
d = np.array(())
while n != s:
if a[n, 0] == b[n, 0]:
c = np.append(c, a[n, :])
d = np.append(d, b[n, :])
n = n+1
elif a[n, 0] > b[n, 0]:
a = np.delete(a, n, 0)
elif a[n, 0] < b[n, 0]:
b = np.delete(b, n, 0)
Traceback (most recent call last):
File "<ipython-input-285-f7c600c498cb>", line 6, in <module>
if a[n, 0] == b[n, 0]:
IndexError: index 1 is out of bounds for axis 0 with size 1

有没有更合理的方法可以使用ID号删除和附加行?

您可以使用np.isin来查找每个数组中第一列的值在另一个数组的第一列值中的位置。那么,这只是一个简单的索引问题。

c = a[np.isin(a[:,0],b[:,0])]
d = b[np.isin(b[:,0],a[:,0])]
>>> c
array([[  9.60977000e+00,   9.75000000e+01,   9.60000000e+01,
9.90000000e+01,   1.00500000e+02,   1.60000000e+00],
[  9.60979000e+00,   9.75000000e+01,   9.60000000e+01,
1.02000000e+02,   1.03500000e+02,   1.10000000e-01],
[  9.60980000e+00,   9.75000000e+01,   9.60000000e+01,
1.03500000e+02,   1.05000000e+02,   5.00000000e-02],
[  9.60981000e+00,   9.75000000e+01,   9.60000000e+01,
1.05000000e+02,   1.06500000e+02,   3.00000000e-02],
[  9.60984000e+00,   9.75000000e+01,   9.60000000e+01,
1.09500000e+02,   1.11000000e+02,   1.00000000e-02]])
>>> d
array([[  9.60977000e+00,   9.90000000e+01,   1.00500000e+02,
9.75000000e+01,   9.60000000e+01,   1.58000000e+00],
[  9.60979000e+00,   1.02000000e+02,   1.03500000e+02,
9.75000000e+01,   9.60000000e+01,   1.10000000e-01],
[  9.60980000e+00,   1.03500000e+02,   1.05000000e+02,
9.75000000e+01,   9.60000000e+01,   5.00000000e-02],
[  9.60981000e+00,   1.05000000e+02,   1.06500000e+02,
9.75000000e+01,   9.60000000e+01,   3.00000000e-02],
[  9.60984000e+00,   1.09500000e+02,   1.11000000e+02,
9.75000000e+01,   9.60000000e+01,   1.00000000e-02]])

解释

>>> np.isin(a[:,0],b[:,0])
array([ True, False,  True,  True,  True, False,  True], dtype=bool)

上面基本上只是向您展示了a第一列的值在b第一列中的位置。然后,您可以使用上面显示的代码,通过布尔数组对a进行索引:

c = a[np.isin(a[:,0],b[:,0])]

最新更新