我有一个名为neighbours_lookup
的数据帧,其中一列ID和一列规范化数据('vec'(存储为数组:
id vec
0 857827315 [-0.5345224838248487, -0.5345224838248487, 1.8...
1 857827311 [-0.3535533905932738, -0.3535533905932738, 2.8...
2 857827316 [-0.3535533905932738, -0.3535533905932738, -0....
3 857827312 [-0.5345224838248487, 1.8708286933869707, -0.5...
4 857827313 [-0.35355339059327373, -0.35355339059327373, -...
我想写一个函数,在这里我可以输入一个ID,并找回10个最近的邻居。
我看了skikit.neighbours
,我认为它看起来很相关——然而,我不知道如何使用它。我试过
knn = NearestNeighbors(n_neighbors=10,
algorithm='auto')
for row in neighbours_lookup['vec']:
knn.fit(row.reshape(1, -1))
我得到的错误是
AttributeError: 'list' object has no attribute 'reshape'
有人能解释一下我该去哪里吗?我的数据帧将具有>100000行,所以我需要它快速。
---编辑---
多亏了达斯爸爸和我自己的折腾,我才成功了!下面的函数。
def get_k_neighbours(isbn,df,number_of_neighbours):
def get_knn(df):
vector_arrays = df['vec'].to_numpy().tolist()
return NearestNeighbors().fit(vector_arrays)
def get_vector(df, isbn):
return df.loc[df['isbn'] == isbn, 'vec'].iloc[0].reshape(1, -1)
def flatten_neighbour_list(nb_indexes):
nb_list = nb_indexes.tolist()
return [item for sublist in nb_list for item in sublist]
knn = get_knn(df)
vector = get_vector(df, isbn)
nb_indexes = knn.kneighbors(vector,number_of_neighbours,return_distance=False)
nb_indexes = flatten_neighbour_list(nb_indexes)
return nb_indexes
Numpy ndarray有一个属性整形,因此没有列出AttributeError。可以将形状列表(n_samples,n_features(的列表调整为"最近邻居"。
from sklearn.neighbors import NearestNeighbors
knn = NearestNeighbors(n_neighbors=10, algorithm='auto')
knn.fit(neighbours_lookup['vec'].to_numpy())
def get_neighbors(id):
vector = neighbours_lookup.loc[id]
return knn.kneighbors([vector], 10, return_distance=False)