python中KNN的数据预处理



预处理需要大量的时间来理解,元组,列表,浮点数,数组结构。数据看起来像

<bound method NDFrame.head of                                                       X                                 Y
0     [1.9902, 1.9902, 1.9902, 1.9902, 1.9902, 0.034...      [0.097, 0.097, 0.097, 0.094]
1     [1.9902, 0.034, 0.034, 0.034, 0.034, 0.034, 0....      [0.094, 0.094, 0.094, 0.094]
2     [0.034, 0.034, 0.097, 0.097, 0.097, 0.097, 0.0...  [1.0882, 1.0882, 1.0882, 1.0882]
3     [0.097, 0.097, 0.097, 0.094, 0.094, 0.094, 0.0...  [1.0882, 1.2382, 1.2382, 1.2382]
4     [0.094, 0.094, 0.094, 0.094, 1.0882, 1.0882, 1...  [1.2382, 1.2382, 1.2182, 1.2182]
...                                                 ...                               ...
3395  [0.136, 0.286, 0.286, 0.286, 0.286, 0.286, 0.2...  [0.1276, 0.1276, 0.1276, 0.1276]
3396  [0.286, 0.286, 0.266, 0.266, 0.266, 0.266, 0.2...   [1.1423, 1.2923, 1.2723, 3.672]
3397  [0.266, 0.266, 0.266, 0.1276, 0.1276, 0.1276, ...      [3.672, 3.672, 3.772, 3.772]
3398  [0.1276, 0.1276, 0.1276, 0.1276, 1.1423, 1.292...      [3.772, 3.802, 3.802, 3.802]
3399  [1.1423, 1.2923, 1.2723, 3.672, 3.672, 3.672, ...      [1.021, 1.021, 1.021, 1.021]

我正在使用

进行数据分割
x=csv_data['X']
y=csv_data['Y']
#print(x)
x_train, x_test, y_train, y_test = train_test_split(x,y)

拟合KNN模型

K = []
training = []
test = []
scores = {}

for k in range(2, 21):
clf = KNeighborsClassifier(n_neighbors = k)
clf.fit(x_train, y_train)

training_score = clf.score(x_train, y_train)
test_score = clf.score(x_test, y_test)
K.append(k)

training.append(training_score)
test.append(test_score)
scores[k] = [training_score, test_score]

得到误差

TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'list'
The above exception was the direct cause of the following exception:
ValueError                                Traceback (most recent call last)
<ipython-input-93-906aa771beda> in <module>()
6 for k in range(2, 21):
7     clf = KNeighborsClassifier(n_neighbors = k)
----> 8     clf.fit(x_train, y_train)
9 
10     training_score = clf.score(x_train, y_train)
7 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81 
82     """
---> 83     return array(a, dtype, copy=False, order=order)
84 
85 
ValueError: setting an array element with a sequence.

我一直在尝试一些方法,如preprocessingStandardScaler对我不起作用。请帮助运行KNN。由于

问题是,当使用KNN时,您的y是形状(n, 4),而KNN.fit方法希望您的y是形状(n,1)。所以简而言之,您只能从y预测1个值。所以简而言之,你要么对y中的每一列使用KNN4次,要么不使用KNN

代码将是这样的

# Import KNN for regression
y1 = y.iloc[:, 0]
y2 = y.iloc[:, 1]
y3 = y.iloc[:, 2]
y4 = y.iloc[:, 3]
regressor1 = KNeighborsRegressor(n_neighbors=k).fit(x, y1)
regressor2 = KNeighborsRegressor(n_neighbors=k).fit(x, y2)
regressor3 = KNeighborsRegressor(n_neighbors=k).fit(x, y3)
regressor4 = KNeighborsRegressor(n_neighbors=k).fit(x, y4)

OMG ! !现在我看到你使用KNN进行分类,实际上你的问题是回归。你的基础真的很差。

还有,不要用那个。你不会从中得到任何好的结果,而且它在计算上也很昂贵。

最新更新