单个样本中的预处理 - 折旧警告 - Preprocessing in single sample

在 Ubuntu 下全新安装的 Anaconda 上...在使用Scikit-Learn进行分类任务之前，我正在以各种方式预处理我的数据。

from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)    
test = scaler.transform(test)

这一切都工作正常，但是如果我有一个想要分类的新样本（下面的温度）（因此我想以相同的方式进行预处理，那么我得到

temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)

然后我收到弃用警告...

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 
and will raise ValueError in 0.19. Reshape your data either using 
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.

所以问题是我应该如何像这样重新缩放单个样本？

我想一个替代方案（不是很好）是...

temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]

但我相信有更好的方法。

只需听听警告告诉您的内容：

重塑数据 X.reshape（-1， 1）如果数据具有单个特征/列和 X.reshape（1， -1）如果它包含单个样本。

对于您的示例类型（如果您有多个功能/列）：

temp = temp.reshape(1,-1)

对于一个专题/列：

temp = temp.reshape(-1,1)

好吧，实际上看起来警告是在告诉您该怎么做。

作为sklearn.pipeline阶段统一接口的一部分，根据经验：

当你看到X，它应该是一个二维的np.array
当你看到y时，它应该是一个具有单一维度的np.array。

因此，在这里，您应该考虑以下几点：

temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)

这可能会有所帮助

temp = ([[1,2,3,4,5,6,.....,7]])

.values.reshape(-1,1)将被接受，没有警报/警告

.reshape(-1,1)将被接受，但带有弃用战争

我遇到了同样的问题，并收到了相同的弃用警告。当我收到消息时，我正在使用 [23， 276] 的 numpy 数组。我试图按照警告重塑它，但最终无处可去。然后我从 numpy 数组中选择每一行（因为我无论如何都在迭代它）并将其分配给一个列表变量。然后它在没有任何警告的情况下工作。

array = []
array.append(temp[0])

然后你可以使用 python list 对象（这里是 'array'）作为 sk-learn 函数的输入。不是最有效的解决方案，但对我有用。

您可以随时像以下方式重塑：

temp = [1,2,3,4,5,5,6,7]
temp = temp.reshape(len(temp), 1)

因为，主要问题是当你的 temp.shape 是：（8，）

你需要（8,1）

-1 是数组的未知维度。在numpy.reshape文档中阅读有关"newshape"参数的更多信息 -

# X is a 1-d ndarray
# If we want a COLUMN vector (many/one/unknown samples, 1 feature)
X = X.reshape(-1, 1)
# you want a ROW vector (one sample, many features/one/unknown)
X = X.reshape(1, -1)

from sklearn.linear_model import LinearRegression
X = df[['x_1']] 
X_n = X.values.reshape(-1, 1)
y = df['target']  
y_n = y.values
model = LinearRegression()
model.fit(X_n, y)
y_pred = pd.Series(model.predict(X_n), index=X.index)

单个样本中的预处理 - 折旧警告

相关内容

最新更新

热门标签：