在pytorch中,如何在给定向量和余弦相似性的情况下对相似向量进行采样



我有这个向量

>>> vec
tensor([[0.2677, 0.1158, 0.5954, 0.9210, 0.3622, 0.4081, 0.4477, 0.7930, 0.1161,
0.5111, 0.2010, 0.3680, 0.1162, 0.1563, 0.4478, 0.9732, 0.7962, 0.0873,
0.9793, 0.9382, 0.9468, 0.0851, 0.7601, 0.0322, 0.7553, 0.4025, 0.3627,
0.5706, 0.3015, 0.1344, 0.8343, 0.8187, 0.4287, 0.5785, 0.9527, 0.1632,
0.2890, 0.5411, 0.5319, 0.7163, 0.3166, 0.5717, 0.5018, 0.5368, 0.3321]])

使用这个向量,我想要生成余弦相似度大于80%的15个向量。

我怎么能在pytorch里做到这一点?

我在这里修改了答案,添加了一个额外的维度,并从numpy转换为torch。

def torch_cos_sim(v,cos_theta,n_vectors = 1,EXACT = True):
"""
EXACT - if True, all vectors will have exactly cos_theta similarity. 
if False, all vectors will have >= cos_theta similarity
v - original vector (1D tensor)
cos_theta -cos similarity in range [-1,1]
"""
# unit vector in direction of v
u = v / torch.norm(v)
u = u.unsqueeze(0).repeat(n_vectors,1)
# random vector with elements in range [-1,1]
r = torch.rand([n_vectors,len(v)])*2 -1 
# unit vector perpendicular to v and u
uperp = torch.stack([r[i] - (torch.dot(r[i],u[i]) * u[i]) for i in range(len(u))])
uperp = uperp/ (torch.norm(uperp,dim = 1).unsqueeze(1).repeat(1,v.shape[0]))
if not EXACT:
cos_theta = torch.rand(n_vectors)* (1-cos_theta) + cos_theta
cos_theta = cos_theta.unsqueeze(1).repeat(1,v.shape[0])       
# w is the linear combination of u and uperp with coefficients costheta
# and sin(theta) = sqrt(1 - costheta**2), respectively:
w = cos_theta*u + torch.sqrt(1 - torch.tensor(cos_theta)**2)*uperp
return w

您可以使用检查输出

vec = torch.rand(54)
output = torch_cos_sim(vec,0.6,n_vectors = 15, EXACT = False)
# test cos similarity
for item in output:
print(torch.dot(vec,item)/(torch.norm(vec)*torch.norm(item)))

相关内容

最新更新