sklearn随机森林相互覆盖



我正在使用sklearn进行随机森林分类。现在我想比较不同的描述符集(一个有125个特征,一个有154个特征(。因此,我创建了两个不同的随机林,但它们似乎相互覆盖,这导致了错误:'模型的功能数量必须与输入匹配。模型n_features为125,输入n_feature斯为154’

rf_std = RandomForestClassifier(n_estimators = 150, max_depth = 200, max_features = 'sqrt')
rf_nostd = RandomForestClassifier(n_estimators = 150, max_depth = 200, max_features = 'sqrt')
rf_std=rf_std.fit(X_train_std,y_train_std)
print('Testing score std:',rf_std.score(X_test_std,y_test_std))
rf_nostd=rf_nostd.fit(X_train_nostd,y_train_nostd)
print('Testing score nostd:',rf_nostd.score(X_test_nostd,y_test_nostd))
# until here it works
fig, (ax1, ax2) = plt.subplots(1, 2)
disp = plot_confusion_matrix(rf_std, X_test_std, y_test_std,
cmap=plt.cm.Blues,
normalize='true',ax=ax1)
disp = plot_confusion_matrix(rf_nostd, X_test_nostd, y_test_nostd,
cmap=plt.cm.Blues,
normalize='true',ax=ax2)
plt.show()
#here i get the error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-eee9fea5dbfb> in <module>
3 disp = plot_confusion_matrix(rf_std, X_test_std, y_test_std,
4                                  cmap=plt.cm.Blues,
----> 5                                  normalize='true',ax=ax1)
6 disp = plot_confusion_matrix(rf_nostd, X_test_nostd, y_test_nostd,
7                                  cmap=plt.cm.Blues,
C:ProgramDataAnaconda3libsite-packagessklearnmetrics_plotconfusion_matrix.py in plot_confusion_matrix(estimator, X, y_true, labels, sample_weight, normalize, display_labels, include_values, xticks_rotation, values_format, cmap, ax)
183         raise ValueError("plot_confusion_matrix only supports classifiers")
184 
--> 185     y_pred = estimator.predict(X)
186     cm = confusion_matrix(y_true, y_pred, sample_weight=sample_weight,
187                           labels=labels, normalize=normalize)
C:ProgramDataAnaconda3libsite-packagessklearnensemble_forest.py in predict(self, X)
610             The predicted classes.
611         """
--> 612         proba = self.predict_proba(X)
613 
614         if self.n_outputs_ == 1:
C:ProgramDataAnaconda3libsite-packagessklearnensemble_forest.py in predict_proba(self, X)
654         check_is_fitted(self)
655         # Check data
--> 656         X = self._validate_X_predict(X)
657 
658         # Assign chunk of trees to jobs
C:ProgramDataAnaconda3libsite-packagessklearnensemble_forest.py in _validate_X_predict(self, X)
410         check_is_fitted(self)
411 
--> 412         return self.estimators_[0]._validate_X_predict(X, check_input=True)
413 
414     @property
C:ProgramDataAnaconda3libsite-packagessklearntree_classes.py in _validate_X_predict(self, X, check_input)
389                              "match the input. Model n_features is %s and "
390                              "input n_features is %s "
--> 391                              % (self.n_features_, n_features))
392 
393         return X
ValueError: Number of features of the model must match the input. Model n_features is 125 and input n_features is 154 

编辑:拟合第二个随机森林以某种方式覆盖第一个随机森林,如下所示:

rf_std=rf_std.fit(X_train_std,y_train_std)
print(rf_std.n_features_)
rf_nostd=rf_nostd.fit(X_train_nostd,y_train_nostd)
print(rf_std.n_features_)
Output:
154
125

为什么这两种型号不分开,有人能帮忙吗?

我能够在traintest输入形状不一致的情况下重现此错误。

试试这个:

assert X_train_std.shape[-1] == X_test_std.shape[-1], "Input shapes don't match."
assert X_train_nostd.shape[-1] == X_test_nostd.shape[-1], "Input shapes don't match."

这就是我如何重现你的错误:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
X_train_std = np.random.rand(400, 154)
X_test_std = np.random.rand(100, 125)
y_train_std = np.random.randint(0, 2, 400).tolist()
y_test_std = np.random.randint(0, 2, 100).tolist()
rf_std = RandomForestClassifier(n_estimators = 150, 
max_depth = 200, max_features = 'sqrt')
rf_std=rf_std.fit(X_train_std,y_train_std)
print('Testing score std:',rf_std.score(X_test_std,y_test_std))

ValueError:模型的特征数量必须与输入匹配。模型n_features为154,输入n_feature斯为125

这通常发生在训练/测试集与形状不匹配时。你能检查一下下面的形状信息吗?

X_train_std.shape[1] == X_test_std.shape[1]  
X_train_nostd.shape[1] == X_test_nostd.shape[1]

如果它与你匹配,你就很擅长,否则你就必须寻找你发现差异的地方。

此致,
MJ

最新更新