ValueError:发现样本数不一致的输入变量:[164309109541]



我已经从两个名为df_testdf_train的数据帧为朴素贝叶斯构建了一个机器学习模型,我在pycharm中使用此代码运行它,但当我使用此模型运行它时,它返回:

ValueError:发现样本数不一致的输入变量:[164309109541]。

from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(df_train.drop(columns = ['Interest_Rate']), df_test, test_size=1.0,random_state=109) # 70% training and 30% test


from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
gnb = GaussianNB()

#Train the model using the training sets
gnb.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = gnb.predict(X_test)

我哪里错了?

您想要70-30的数据分割,但在这里您创建了100%的测试数据。将test_size更改为0.3(30%(,而不是1.0(100%(。

X_train, X_test, y_train, y_test = train_test_split(df_train.drop(columns = ['Interest_Rate']), df_test, test_size=0.3,random_state=109) # 70% training and 30% test

最新更新