我是新的sklearn管道并遵循示例代码。我在其他示例中看到了我们可以做pipeline.fit_transform(train_X)
的情况,所以我在pipeline.fit_transform(X)
的管道上尝试了同样的事情,但是它给了我一个错误
"返回self.fit(x,** fit_params).transform(x)
typeerror:fit()精确3个参数(2给定)"
如果我删除了SVM零件并将管道定义为pipeline = Pipeline([("features", combined_features)])
,我仍然看到了错误。
有人知道为什么fit_transform
在这里不起作用吗?
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris = load_iris()
X, y = iris.data, iris.target
# This dataset is way to high-dimensional. Better do PCA:
pca = PCA(n_components=2)
# Maybe some original features where good, too?
selection = SelectKBest(k=1)
# Build estimator from PCA and Univariate selection:
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
# Use combined features to transform dataset:
X_features = combined_features.fit(X, y).transform(X)
svm = SVC(kernel="linear")
# Do grid search over k, n_components and C:
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
param_grid = dict(features__pca__n_components=[1, 2, 3],
features__univ_select__k=[1, 2],
svm__C=[0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
您在上面的示例中会遇到错误,因为您还需要将标签传递给管道。您应该致电pipeline.fit_transform(X,y)
。pipeline
中的最后一步是分类器,SVC
,分类器的fit
方法还需要标签作为强制性参数。所有分类器的fit
方法也需要标签,因为分类算法使用这些标签来训练分类器中的权重。
类似地,即使您删除了SVC
,您仍然会遇到错误,因为fit
的SelectKBest
类方法也需要X
和y
。