KNN 与 RFECV 返回:"The classifier does not expose " coef_ " or " feature_importances_ " attributes"



我正在尝试在KNeighborsClassifier上应用RFECV以消除无关紧要的特征。为了使问题可重复,下面是一个包含虹膜数据的示例:

from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
y = iris.target
X = iris.data
estimator = KNeighborsClassifier()
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)

这会导致以下错误按摩:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-19f0f2f0f0e7> in <module>()
      7 estimator = KNeighborsClassifier()
      8 selector = RFECV(estimator, step=1, cv=5)
----> 9 selector.fit(X, y)
C:...Anaconda3libsite-packagessklearnfeature_selectionrfe.py in fit(self, X, y)
    422                       verbose=self.verbose - 1)
    423 
--> 424             rfe._fit(X_train, y_train, lambda estimator, features:
    425                      _score(estimator, X_test[:, features], y_test, scorer))
    426             scores.append(np.array(rfe.scores_[::-1]).reshape(1, -1))
C:...Anaconda3libsite-packagessklearnfeature_selectionrfe.py in _fit(self, X, y, step_score)
    180                 coefs = estimator.feature_importances_
    181             else:
--> 182                 raise RuntimeError('The classifier does not expose '
    183                                    '"coef_" or "feature_importances_" '
    184                                    'attributes')
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes

如果我将分类器更改为 SVC,则为:

from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.svm import SVC
iris = load_iris()
y = iris.target
X = iris.data
estimator = SVC(kernel="linear")
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)

它会正常工作。关于如何解决这个问题的任何建议?

注意:我昨天更新了Anaconda,它也更新了sklearn。

错误是不言自明的 - knn 不提供进行功能选择的逻辑。你不能使用它(sklearn的实现)来实现这样的目标,除非你定义了你自己的KNN特征重要性度量。据我所知 - 没有这样的通用对象,所以 - scikit-learn 没有实现它。另一方面,SVM 与每个线性模型一样 - 提供此类信息。

您可能从mlxtend库中获得了部分解决方案:

http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/

见 https://github.com/rasbt/mlxtend

至于Scikit-learn,请参阅:

https://github.com/scikit-learn/scikit-learn/issues/6920

相关内容

  • 没有找到相关文章

最新更新