如何在scikit learn中将参数仅传递给管道对象的一部分

我需要像这样将一个参数sample_weight传递给我的RandomForestClassifier：

X = np.array([[2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
        1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 3.0,
        2.0, '0'],
       [15.0, 2.0, 5.0, 5.0, 0.466666666667, 4.0, 3.0, 2.0, 0.0, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
        7.0, 14.0, 2.0, '0'],
       [3.0, 4.0, 3.0, 1.0, 1.33333333333, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        9.0, 8.0, 2.0, '0'],
       [3.0, 2.0, 3.0, 0.0, 0.666666666667, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
        5.0, 3.0, 1.0, '0']], dtype=object)
y = np.array([ 0.,  0.,  1.,  0.])
m = sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=100,
        min_samples_leaf=5, 
        max_depth=10)
m.fit(X, y, sample_weight=np.array([3,4,2,3]))

上面的代码工作得很好。然后，我尝试在这样的管道对象中执行此操作，使用管道对象而不仅仅是随机森林：

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])
m.fit(X, y, sample_weight=np.array([3,4,2,3]))

现在这在fit方法中中断了" ValueError: need more than 1 value to unpack "。

ValueError                                Traceback (most recent call last)
<ipython-input-212-c4299f5b3008> in <module>()
     25         max_depth=10))])
     26 
---> 27 m.fit(X, y, sample_weights=np.array([3,4,2,3]))
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
    128         data, then fit the transformed data using the final estimator.
    129         """
--> 130         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    131         self.steps[-1][-1].fit(Xt, y, **fit_params)
    132         return self
/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
    113         fit_params_steps = dict((step, {}) for step, _ in self.steps)
    114         for pname, pval in six.iteritems(fit_params):
--> 115             step, param = pname.split('__', 1)
    116             fit_params_steps[step][param] = pval
    117         Xt = X
ValueError: need more than 1 value to unpack

我正在使用sklearn版本0.14.
我认为问题在于管道中的F selection步骤没有接受sample_weights的论据。如何在运行" fit "的情况下将此参数仅传递给管道中的一个步骤？谢谢。

从文档中：

管道的目的是组装几个步骤，这些步骤可以是在设置不同参数时一起交叉验证。为此，它允许使用其名称设置各个步骤的参数以及用"__"分隔的参数名称，如下例所示。

因此，您只需在要传递给'model'步骤的任何拟合参数 kwarg 前面插入model__：

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))

还可以使用方法set_params并在步骤名称前加上前缀。

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.set_params(model__sample_weight=np.array([3,4,2,3]))

希望我能对上面的帖子发表评论@rovyko而不是单独的答案，但我没有足够的堆栈溢出声誉来发表评论，所以在这里。

您不能使用：

Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])

以设置RandomForestClassifier.fit()方法的参数。代码（此处）中所示Pipeline.set_params()仅适用于管道中各个步骤的初始化参数。 RandomForestClassifier没有名为 sample_weight 的初始化参数（请参阅此处的__init__()方法）。 sample_weight实际上是RandomForestClassifier fit()方法的输入参数，因此只能通过正确标记的答案中给出的方法进行设置@ali_m即，

m.fit(X, y, model__sample_weight=np.array([3,4,2,3])) .

相关内容

最新更新

热门标签：