Python- Sklearn Fit阵列错误

我对使用Sklearn和Python进行数据分析我相对较新，并且正在尝试在我从.csv文件中加载的数据集上运行一些线性回归。

我将数据加载到train_test_split中没有任何问题，但是当我尝试适合培训数据时，我会收到错误ValueError: Expected 2D array, got 1D array instead: ... Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.。

model = lm.fit(X_train, y_train)

的错误

由于我使用这些软件包的新鲜感，我试图确定这是否是在运行回归之前不将导入的CSV设置为PANDAS数据框架的结果，还是与其他内容有关。<<<<<<<<<<<</p>

我的CSV的格式是：

Month,Date,Day of Week,Growth,Sunlight,Plants
7,7/1/17,Saturday,44,611,26
7,7/2/17,Sunday,30,507,14
7,7/5/17,Wednesday,55,994,25
7,7/6/17,Thursday,50,1014,23
7,7/7/17,Friday,78,850,49
7,7/8/17,Saturday,81,551,50
7,7/9/17,Sunday,59,506,29

这是我设置回归的方式：

import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt

organic = pd.read_csv("linear-regression.csv")
organic.columns
Index(['Month', 'Date', 'Day of Week', 'Growth', 'Sunlight', 'Plants'], dtype='object')
# Set the depedent (Growth) and independent (Sunlight)
y = organic['Growth']
X = organic['Sunlight']
# Test train split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print (X_train.shape, X_test.shape)
print (y_train.shape, y_test.shape)
(192,) (49,)
(192,) (49,)
lm = linear_model.LinearRegression()
model = lm.fit(X_train, y_train)
# Error pointing to an array with values from Sunlight [611, 507, 994, ...]

您只需要将最后一列调整为

lm = linear_model.LinearRegression()
model = lm.fit(X_train.values.reshape(-1,1), y_train)

，模型将适合。原因是Sklearn的线性模型期望

x：numpy阵列或形状的稀疏矩阵[n_samples，n_features]

因此，在这种情况下，我们的培训数据必须形式为[7,1]

您仅使用一个功能，因此它告诉您在错误中该怎么做：

使用array.reshape(-1，1(重塑数据，如果您的数据具有单个功能。

数据始终必须在scikit-learn中为2D。

(不要忘记X = organic['Sunglight']中的错字(

将数据加载到train_test_split(X, y, test_size=0.2)中后，它将用(192, )和(49, )尺寸返回PANDAS系列X_train和X_test。如先前的答案中所述，Sklearn期望形状[n_samples,n_features]的矩阵为X_train，X_test数据。您可以简单地将PANDAS系列X_train和X_test转换为Pandas DataFrames，以将其尺寸更改为(192, 1)和(49, 1)。

lm = linear_model.LinearRegression()
model = lm.fit(X_train.to_frame(), y_train)

相关内容

最新更新

热门标签：