我正在尝试编写一个机器学习模型,该模型使用RandomForestClassifier
来预测乳腺癌。代码如下:
from sklearn.model_selection import train_test_split
print("Shape of training set:", x_train.shape)
print("Shape of test set:", x_test.shape)
训练集形状:(292,30)
测试集的形状为:(91,29)
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_train = ss.fit_transform(x_train)
X_test = ss.fit_transform(x_test)
RandomForestClassifier
的实例化:
from sklearn.ensemble import RandomForestClassifier
rand_clf = RandomForestClassifier(criterion = 'entropy', max_depth = 11, max_features = 'auto', min_samples_leaf = 2, min_samples_split = 3, n_estimators = 130)
rand_clf.fit(X_train, y_train)
I ' m stuck here:
y_pred = rand_clf.predict(X_test)
显示的错误是:
ValueError: X has 29 features, but RandomForestClassifier is expecting 30 features as input
我该如何解决这个问题?否则,x_train
和x_test
列不相等。
问题就在这里:
训练集形状:(292,30)
测试集的形状为:(91,29)
训练集和测试集需要具有相同数量的特征,要么是29个,要么是30个(两者)