测试列车拆分:错误



如何拆分df:

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
Y_pred=logreg.predict(Y_test)

IM收到一个错误,如:

ValueError                                Traceback (most recent call last)
<ipython-input-38-f81a6db0e9ae> in <module>()
----> 1 Y_pred=logreg.predict(Y_test)
~Anaconda3libsite-packagessklearnlinear_modelbase.py in predict(self, X)
322             Predicted class label per sample.
323         """
--> 324         scores = self.decision_function(X)
325         if len(scores.shape) == 1:
326             indices = (scores > 0).astype(np.int)
~Anaconda3libsite-packagessklearnlinear_modelbase.py in decision_function(self, X)
298                                  "yet" % {'name': type(self).__name__})
299 
--> 300         X = check_array(X, accept_sparse='csr')
301 
302         n_features = self.coef_.shape[1]
~Anaconda3libsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
439                     "Reshape your data either using array.reshape(-1, 1) if "
440                     "your data has a single feature or array.reshape(1, -1) "
--> 441                     "if it contains a single sample.".format(array))
442             array = np.atleast_2d(array)
443             # To ensure that array flags are maintained
ValueError: Expected 2D array, got 1D array instead:
array=[0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1
0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0
0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1
1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1
1 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1
0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0
1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0
1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1
1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0
1 0 1 0 1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

您需要使用X_test进行预测而不是Y_test。X存储自变量(用于预测(,Y存储因变量(需要预测(。

因此,您的最后一行应该是:

Y_pred=logreg.predict(X_test)

模型应该预测X_test,而不是Y_test

使用此:

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
# Here is the change
Y_pred=logreg.predict(X_test)

最新更新