我有两个numpy数组,每个数组在第0列中都有一个标识号。
在每个数组的标识号匹配的地方,我希望保留与这些标识号相关联的相应行。
如果在另一个数组中有一个ID没有匹配的ID,我希望删除与该ID号关联的行,只有在ID号出现的数组中。
这两个数组都是按其ID号排序的。
输入阵列a&b、 以及输出阵列c&d、 可以在下面找到-注意,数组的行数不相同(n.b.a&b的实际示例要大得多-(2487,12(&(分别为2482、12(
In:
a =
[[9.60977, 97.5, 96, 99, 100.5, 1.60]
[9.60978, 97.5, 96, 100.5, 102, 0.31]
[9.60979, 97.5, 96, 102, 103.5, 0.11]
[9.60980, 97.5, 96, 103.5, 105, 0.05]
[9.60981, 97.5, 96, 105, 106.5, 0.03]
[9.60983, 97.5, 96, 108, 109.5, 0.01]
[9.60984, 97.5, 96, 109.5, 111, 0.01]]
b =
[[9.60977, 99, 100.5, 97.5, 96, 1.58]
[9.60979, 102, 103.5, 97.5, 96, 0.11]
[9.60980, 103.5, 105, 97.5, 96, 0.05]
[9.60981, 105, 106.5, 97.5, 96, 0.03]
[9.60982, 106.5, 108, 97.5, 96, 0.02]
[9.60984, 109.5, 111, 97.5, 96, 0.01]]
输出:
c =
[[9.60977, 97.5, 96, 99, 100.5, 1.60]
[9.60979, 97.5, 96, 102, 103.5, 0.11]
[9.60980, 97.5, 96, 103.5, 105, 0.05]
[9.60981, 97.5, 96, 105, 106.5, 0.03]
[9.60984, 97.5, 96, 109.5, 111, 0.01]]
d =
[[9.60977, 99, 100.5, 97.5, 96, 1.58]
[9.60979, 102, 103.5, 97.5, 96, 0.11]
[9.60980, 103.5, 105, 97.5, 96, 0.05]
[9.60981, 105, 106.5, 97.5, 96, 0.03]
[9.60984, 109.5, 111, 97.5, 96, 0.01]]
我曾尝试在for循环中使用一对if语句,但这失败了,因为1(数组的长度不相同(请参阅下面的Traceback(,2(一旦值被删除,它就不会重新测试行
for i in np.arange(0, max(len(a), len(b)), 1):
if a[i, 0] > b[i, 0]:
a = np.delete(a, i, 0)
if a[i, 0] < b[i, 0]:
b = np.delete(b, i, 0)
Traceback (most recent call last):
File "<ipython-input-271-509fc93aea3b>", line 2, in <module>
if a[i, 0] > b[i, 0]:
IndexError: index 4 is out of bounds for axis 0 with size 3
我也尝试过while循环,但它删除了数组b 中所有错误的行
n = 0
s = max(len(a), len(b))
c = np.array(())
d = np.array(())
while n != s:
if a[n, 0] == b[n, 0]:
c = np.append(c, a[n, :])
d = np.append(d, b[n, :])
n = n+1
elif a[n, 0] > b[n, 0]:
a = np.delete(a, n, 0)
elif a[n, 0] < b[n, 0]:
b = np.delete(b, n, 0)
Traceback (most recent call last):
File "<ipython-input-285-f7c600c498cb>", line 6, in <module>
if a[n, 0] == b[n, 0]:
IndexError: index 1 is out of bounds for axis 0 with size 1
有没有更合理的方法可以使用ID号删除和附加行?
您可以使用np.isin
来查找每个数组中第一列的值在另一个数组的第一列值中的位置。那么,这只是一个简单的索引问题。
c = a[np.isin(a[:,0],b[:,0])]
d = b[np.isin(b[:,0],a[:,0])]
>>> c
array([[ 9.60977000e+00, 9.75000000e+01, 9.60000000e+01,
9.90000000e+01, 1.00500000e+02, 1.60000000e+00],
[ 9.60979000e+00, 9.75000000e+01, 9.60000000e+01,
1.02000000e+02, 1.03500000e+02, 1.10000000e-01],
[ 9.60980000e+00, 9.75000000e+01, 9.60000000e+01,
1.03500000e+02, 1.05000000e+02, 5.00000000e-02],
[ 9.60981000e+00, 9.75000000e+01, 9.60000000e+01,
1.05000000e+02, 1.06500000e+02, 3.00000000e-02],
[ 9.60984000e+00, 9.75000000e+01, 9.60000000e+01,
1.09500000e+02, 1.11000000e+02, 1.00000000e-02]])
>>> d
array([[ 9.60977000e+00, 9.90000000e+01, 1.00500000e+02,
9.75000000e+01, 9.60000000e+01, 1.58000000e+00],
[ 9.60979000e+00, 1.02000000e+02, 1.03500000e+02,
9.75000000e+01, 9.60000000e+01, 1.10000000e-01],
[ 9.60980000e+00, 1.03500000e+02, 1.05000000e+02,
9.75000000e+01, 9.60000000e+01, 5.00000000e-02],
[ 9.60981000e+00, 1.05000000e+02, 1.06500000e+02,
9.75000000e+01, 9.60000000e+01, 3.00000000e-02],
[ 9.60984000e+00, 1.09500000e+02, 1.11000000e+02,
9.75000000e+01, 9.60000000e+01, 1.00000000e-02]])
解释:
>>> np.isin(a[:,0],b[:,0])
array([ True, False, True, True, True, False, True], dtype=bool)
上面基本上只是向您展示了a
第一列的值在b
第一列中的位置。然后,您可以使用上面显示的代码,通过布尔数组对a
进行索引:
c = a[np.isin(a[:,0],b[:,0])]