适合与管道中的fit_transform

在此页面中 https://www.kaggle.com/baghern/a-deep-dive-into-sklearn-pipelines

它调用fit_transfrom来转换数据，如下所示：

from sklearn.pipeline import FeatureUnion
feats = FeatureUnion([('text', text), 
                      ('length', length),
                      ('words', words),
                      ('words_not_stopword', words_not_stopword),
                      ('avg_word_length', avg_word_length),
                      ('commas', commas)])
feature_processing = Pipeline([('feats', feats)])
feature_processing.fit_transform(X_train)

在使用特征处理进行训练期间，它只使用fit然后使用predict

from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])
pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)
np.mean(preds == y_test)

问题是，对于第二种情况，fit是否对X_train进行转换（就像transform所实现的那样，因为我们在这里不称呼fit_transform）？

sklearn-pipeline有一些不错的功能。它以非常干净的方式执行多项任务。我们定义我们的features，它的transformation和list of classifiers，我们想要执行，所有这些都在一个功能中。

在第一步

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

您已经定义了特征的名称及其变换函数（包含在feat中），在第二步中，您定义了分类器的名称和分类器分类器。

现在，在调用 pipeline.fit 时，它首先拟合特征并对其进行变换，然后将分类器拟合到变换后的特征上。因此，它为我们做了一些步骤。更多你可以在这里查看

相关内容

最新更新

热门标签：