如何摆脱此错误
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("train.csv")
clean = {"Sex": {"male":1, "female":0}}
df.replace(clean, inplace = True)
df["label"] = df['Survived']
df = df.drop(["Name","Ticket","Cabin","Embarked","Fare","Parch","Survived"], axis = 1)
df = df.dropna(axis = 0, how="any")
X = df.drop(["label"],axis = 1).values
y = df["label"].values
X_train , y_train, X_test, y_test = train_test_split(X, y, test_size = 0.7)
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
print("Accuracy on test subset: (:.3f)".format(log_reg.score(X_train, y_train)))
ERROR
Traceback (most recent call last):
File "C:UsersuserDocuments17kaggle'logistic.py", line 20, in <module>
log_reg.fit(X_train, y_train)
File "C:UsersuserAppDataLocalProgramsPythonPython36-32libsite-packagessklearnlinear_modellogistic.py", line 1216, in fit
order="C")
File "C:UsersuserAppDataLocalProgramsPythonPython36-32libsite-packagessklearnutilsvalidation.py", line 547, in check_X_y
y = column_or_1d(y, warn=True)
File "C:UsersuserAppDataLocalProgramsPythonPython36-32libsite-packagessklearnutilsvalidation.py", line 583, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (500, 5)
错误是由于以下原因:
X_train , y_train, X_test, y_test = train_test_split(X, y, test_size = 0.7)
这不是train_test_split
返回的内容。
实际用法应为:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.7)
train_test_split
将按提供数据的顺序返回拆分的数组。所以 X 将被拆分成 X_train, X_test
并首先返回,然后 y 将作为 y_train y_test
返回。希望这有帮助。