Python 中的多元线性回归机器学习 --ValueError:形状 (8,15) 和 (390,) 未对齐



我正在尝试使用多元线性回归机器学习根据某些输入评估输出。我已经训练了数据并在运行以下代码时获得了正确的预期值:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
#dataset = pd.read_csv('50_Startups.csv')
dataset = pd.read_excel('MAHI2.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 0] = labelencoder.fit_transform(X[:, 0])
labelencoder1 = LabelEncoder()
X[:, 1] = labelencoder.fit_transform(X[:, 1])
labelencoder2 = LabelEncoder()
X[:, 2] = labelencoder.fit_transform(X[:, 2])
labelencoder3 = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = "all")
#X = onehotencoder.fit_transform(X).toarray()
X = onehotencoder.fit_transform(X).toarray()
# Avoiding the Dummy Variable Trap
X = X[:, 1:]

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)
y_pred = regressor.predict(X)
df = pd.DataFrame({'Actual': y.flatten(), 'Predicted': y_pred.flatten()})
df

现在我正在尝试使用相同的模型来评估另一组输入数据,如下所示:

dataset1 = pd.read_excel('MAHI3.xlsx')
#dataset2 = pd.get_dummies(dataset1)
X1 = dataset1.iloc[:, :-1].values
y2 = dataset1.iloc[:, 5].values                 
# Encoding categorical data
#labelencoder3 = LabelEncoder()
X1[:, 0] = labelencoder.fit_transform(X1[:, 0])
#labelencoder4 = LabelEncoder()
X1[:, 1] = labelencoder.fit_transform(X1[:, 1])
#labelencoder5 = LabelEncoder()
X1[:, 2] = labelencoder.fit_transform(X1[:, 2])
#labelencoder6 = LabelEncoder()
X1[:, 3] = labelencoder.fit_transform(X1[:, 3])
#onehotencoder2 = OneHotEncoder(categorical_features = "all")
X1 = onehotencoder.fit_transform(X1).toarray()
output = regressor.predict(X1)
df1 = pd.DataFrame({'Actual1': y2.flatten(), 'Predicted1': output.flatten()})
df1

但是当我运行这段代码时,出现以下错误:ValueError: shapes (6,13) and (390,) not aligned: 13 (dim 1) != 390 (dim 0)如果有人帮助我解决这个问题,那就太好了。

我无法访问您的数据集,但我似乎您的问题是维度问题。似乎改变尺寸的东西是"onehotencoder"。

尝试对两者使用相同的热编码器。

ohe = onehotencoder.fit(X)
X = ohe.transform(X).toarray()
X1 = ohe.transform(X1).toarray()

应确保"回归器"模型接收的特征数量与训练时的特征数量相同。

最新更新