使用 python 在前步选择中得分"nan"的解决方案



我使用mlxtend中的顺序特征选择(sfs(来运行步进特征选择。

x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "r2",
cv = 4,
n_jobs = -1
).fit(x_train, y_train)

代码运行,但返回的评分值为NaN。

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s finished
[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    0.0s finished
[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 out of  26 | elapsed:    0.0s finished

如果您正在进行分类,则不应使用r2进行评分。您可以参考scikit学习帮助页面,以获取分类或回归的指标列表。

您还应该指定您正在使用mlxtend中的SequentialFeatureSelector

下面我使用了准确性,它有效:

from mlxtend.feature_selection import SequentialFeatureSelector as SFS 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
x, y = make_classification(n_features=50,n_informative=28)
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(
RandomForestClassifier(),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "accuracy").fit(x_train, y_train)

相关内容

最新更新