我正在尝试在KNeighborsClassifier上应用RFECV以消除无关紧要的特征。为了使问题可重复,下面是一个包含虹膜数据的示例:
from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
y = iris.target
X = iris.data
estimator = KNeighborsClassifier()
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
这会导致以下错误按摩:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-27-19f0f2f0f0e7> in <module>()
7 estimator = KNeighborsClassifier()
8 selector = RFECV(estimator, step=1, cv=5)
----> 9 selector.fit(X, y)
C:...Anaconda3libsite-packagessklearnfeature_selectionrfe.py in fit(self, X, y)
422 verbose=self.verbose - 1)
423
--> 424 rfe._fit(X_train, y_train, lambda estimator, features:
425 _score(estimator, X_test[:, features], y_test, scorer))
426 scores.append(np.array(rfe.scores_[::-1]).reshape(1, -1))
C:...Anaconda3libsite-packagessklearnfeature_selectionrfe.py in _fit(self, X, y, step_score)
180 coefs = estimator.feature_importances_
181 else:
--> 182 raise RuntimeError('The classifier does not expose '
183 '"coef_" or "feature_importances_" '
184 'attributes')
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
如果我将分类器更改为 SVC,则为:
from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.svm import SVC
iris = load_iris()
y = iris.target
X = iris.data
estimator = SVC(kernel="linear")
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
它会正常工作。关于如何解决这个问题的任何建议?
注意:我昨天更新了Anaconda,它也更新了sklearn。
错误是不言自明的 - knn 不提供进行功能选择的逻辑。你不能使用它(sklearn的实现)来实现这样的目标,除非你定义了你自己的KNN特征重要性度量。据我所知 - 没有这样的通用对象,所以 - scikit-learn 没有实现它。另一方面,SVM 与每个线性模型一样 - 提供此类信息。
您可能从mlxtend
库中获得了部分解决方案:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
见 https://github.com/rasbt/mlxtend
至于Scikit-learn,请参阅:
https://github.com/scikit-learn/scikit-learn/issues/6920