使用numpy方法计算内核矩阵



我有一个形状为d X N的数据(每列都是一个特征向量(我有这样的代码来计算内核矩阵:

def kernel(x1, x2):
return x1.T @ x2
data = np.array([[1,2,3], [1,2,3], [1,2,3]])
result = []
for i in range(data.shape[1]):
current_result = []
for j in range(data.shape[1]):
x1 = data[:, i]
x2 = data[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
np.array(result)

我得到的结果是:

array([[ 3,  6,  9],
[ 6, 12, 18],
[ 9, 18, 27]])

问题是这个代码太慢了,所以我尝试使用np.vectorize:

vec = np.vectorize(kernel, signature='(n),(n)->()')
vec(data, data)

但我得到了错误的结果:

array([14, 14, 14])

我做错了什么?

当测试问题的更大维度和确保稳健性的随机数时,例如维度(100,200),有几种方法:

import numpy as np
def kernel(x1, x2):
return x1.T @ x2
def kernel_kenny(a):
result = []
for i in range(a.shape[1]):
current_result = []
for j in range(a.shape[1]):
x1 = a[:, i]
x2 = a[:, j]
current_result.append(kernel(x1, x2))

result.append(current_result)
return np.array(result)
a = np.random.random((100,200))
res1 = kernel_kenny(a)
# perhaps einsum signature might help you to understand the calculations
res2 = np.einsum('ji,jk->ik', a, a, optimize=True)
# or the following if you want to explicitly specify the transpose
# res2 = np.einsum('ij,jk->ik', a.T, a, optimize=True)

# or simply ...
res3 = a.T @ a

赫拉是健全性检查:

np.allclose(res1,res2)
>>> True
np.allclose(res1,res3)
>>> True

和时间:

%timeit kernel_kenny(a)
>>> 83.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ji,jk->ik', a, a, optimize=True)
>>> 325 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a.T @ a
>>> 82 µs ± 9.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

最新更新