我已经从两个名为df_test
和df_train
的数据帧为朴素贝叶斯构建了一个机器学习模型,我在pycharm中使用此代码运行它,但当我使用此模型运行它时,它返回:
ValueError:发现样本数不一致的输入变量:[164309109541]。
from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(df_train.drop(columns = ['Interest_Rate']), df_test, test_size=1.0,random_state=109) # 70% training and 30% test
from sklearn.naive_bayes import GaussianNB
#Create a Gaussian Classifier
gnb = GaussianNB()
#Train the model using the training sets
gnb.fit(X_train, y_train)
#Predict the response for test dataset
y_pred = gnb.predict(X_test)
我哪里错了?
您想要70-30的数据分割,但在这里您创建了100%的测试数据。将test_size
更改为0.3(30%(,而不是1.0(100%(。
X_train, X_test, y_train, y_test = train_test_split(df_train.drop(columns = ['Interest_Rate']), df_test, test_size=0.3,random_state=109) # 70% training and 30% test