比较两个矩阵并创建其共同值的矩阵

我目前正在尝试比较两个矩阵，并通过python将匹配的行返回到"交集矩阵"中。这两个矩阵都是数值数据 - 我正在尝试返回它们公共条目的行(我也尝试只创建一个矩阵，沿第一列匹配位置条目，然后创建一个伴随的元组(。这些矩阵在维度上不一定相同。

假设我有两个匹配列长度但任意的矩阵(可以非常大且行长不同(

23 3 4 5       23 3 4 5
12 6 7 8       45 7 8 9
45 7 8 9       34 5 6 7
67 4 5 6       3 5 6 7

我想创建一个矩阵，其中"交集"用于这个低维示例

23 3 4 5
45 7 8 9

也许它看起来像这样：

1 2 3 4  2 4 6 7
2 4 6 7  4 10 6 9
4 6 7 8  5 6 7 8
5 6 7 8

在这种情况下，我们只需要：

2 4 6 7
5 6 7 8

我尝试过这种性质的东西：

def compare(x):
#    This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D
     y =n_c(data_cleaner(x))
#    this is a second matrix that i'd like to compare it to.  note that the sizes are probably not the same, but the columns length are
     z=data_cleaner(x)
#    I initialized an array that would hold the matching values 
     compare=[]
#    create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
     for i in range(len(y)):
        for j in range(len(z)):
            if y[0][i] == z[0][i]:
#            I want the row or the n tuple (shown here) of those columns  with the matching first indexes as shown above    
             c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
                compare.append(c_vec)
            else:
                pass
    return compare 
compare(c_i_w)

可悲的是，我遇到了一些错误。具体来说，我似乎在告诉 python 不正确地引用值。

考虑数组

a和b

a = np.array([
        [23, 3, 4, 5],
        [12, 6, 7, 8],
        [45, 7, 8, 9],
        [67, 4, 5, 6]
    ])
b = np.array([
        [23, 3, 4, 5],
        [45, 7, 8, 9],
        [34, 5, 6, 7],
        [ 3, 5, 6, 7]
    ])
print(a)
[[23  3  4  5]
 [12  6  7  8]
 [45  7  8  9]
 [67  4  5  6]]
print(b)
[[23  3  4  5]
 [45  7  8  9]
 [34  5  6  7]
 [ 3  5  6  7]]

然后我们可以广播并得到一个相等行的数组

x = (a[:, None] == b).all(-1)
print(x)
[[ True False False False]
 [False False False False]
 [False  True False False]
 [False False False False]]

使用np.where我们可以识别索引

i, j = np.where(x)

显示a的哪几行

print(a[i])
[[23  3  4  5]
 [45  7  8  9]]

以及哪几行b

print(b[j])
[[23  3  4  5]
 [45  7  8  9]]

他们是一样的！很好。这就是我们想要的。

我们可以将结果放入一个pandas具有 MultiIndex 的 MultiIndex 数据帧中，该 MultiIndex 的行号来自第一级的a，行号来自第二级的b。

pd.DataFrame(a[i], [i, j])
      0  1  2  3
0 0  23  3  4  5
2 1  45  7  8  9

相关内容

最新更新

热门标签：