我使用standard scaler
,PCA
和Random Forest
对一些数据进行分类。我想使用pipeline
方法,然而,我不知道如何让pipeline
知道我想要n_components
= 95%解释方差。我如何在pipeline
环境中设置代码来计算这个数字?
代码如下:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
pipe = Pipeline([('scaler', StandardScaler()),
# ('pca', PCA(n_components=n_to_reach_95)),
('pca', PCA(n_components=15)),
('clf', RandomForestClassifier())])
# Declare a hyperparameter grid
parameter_space = {
'clf__n_estimators': [10,50,100],
'clf__criterion': ['gini', 'entropy'],
'clf__max_depth': np.linspace(10,50,11),
}
clf = GridSearchCV(pipe, parameter_space, cv = 5, scoring = "accuracy", verbose = True) # model
pipe.fit(X_train,y_train)
sklearn实际上支持像n_components = 0.95
这样的符号。