X有29个特征,但RandomForestClassifier期望30个特征作为输入



我正在尝试编写一个机器学习模型,该模型使用RandomForestClassifier来预测乳腺癌。代码如下:

from sklearn.model_selection import train_test_split
print("Shape of training set:", x_train.shape)
print("Shape of test set:", x_test.shape)

训练集形状:(292,30)

测试集的形状为:(91,29)

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_train = ss.fit_transform(x_train)
X_test = ss.fit_transform(x_test)

RandomForestClassifier的实例化:

from sklearn.ensemble import RandomForestClassifier
rand_clf = RandomForestClassifier(criterion = 'entropy', max_depth = 11, max_features = 'auto', min_samples_leaf = 2, min_samples_split = 3, n_estimators = 130)
rand_clf.fit(X_train, y_train)

I ' m stuck here:

y_pred = rand_clf.predict(X_test)

显示的错误是:

ValueError: X has 29 features, but RandomForestClassifier is expecting 30 features as input

我该如何解决这个问题?否则,x_trainx_test列不相等。

问题就在这里:

训练集形状:(292,30)

测试集的形状为:(91,29)

训练集和测试集需要具有相同数量的特征,要么是29个,要么是30个(两者)

最新更新