在Sklearln渐变bloostingRegressor中提早停止



我正在使用此处实现的监视级

class Monitor():
    """Monitor for early stopping in Gradient Boosting for classification.
    The monitor checks the validation loss between each training stage. When
    too many successive stages have increased the loss, the monitor will return
    true, stopping the training early.
    Parameters
    ----------
    X_valid : array-like, shape = [n_samples, n_features]
      Training vectors, where n_samples is the number of samples
      and n_features is the number of features.
    y_valid : array-like, shape = [n_samples]
      Target values (integers in classification, real numbers in
      regression)
      For classification, labels must correspond to classes.
    max_consecutive_decreases : int, optional (default=5)
      Early stopping criteria: when the number of consecutive iterations that
      result in a worse performance on the validation set exceeds this value,
      the training stops.
    """
    def __init__(self, X_valid, y_valid, max_consecutive_decreases=5):
        self.X_valid = X_valid
        self.y_valid = y_valid
        self.max_consecutive_decreases = max_consecutive_decreases
        self.losses = []

    def __call__(self, i, clf, args):
        if i == 0:
            self.consecutive_decreases_ = 0
            self.predictions = clf._init_decision_function(self.X_valid)
        predict_stage(clf.estimators_, i, self.X_valid, clf.learning_rate,
                      self.predictions)
        self.losses.append(clf.loss_(self.y_valid, self.predictions))
        if len(self.losses) >= 2 and self.losses[-1] > self.losses[-2]:
            self.consecutive_decreases_ += 1
        else:
            self.consecutive_decreases_ = 0
        if self.consecutive_decreases_ >= self.max_consecutive_decreases:
            print("f"
                  "({}): s {}.".format(self.consecutive_decreases_, i)),
            return True
        else:
            return False
params = { 'n_estimators':             nEstimators,
           'max_depth':                maxDepth,
           'min_samples_split':        minSamplesSplit,
           'min_samples_leaf':         minSamplesLeaf,
           'min_weight_fraction_leaf': minWeightFractionLeaf,
           'min_impurity_decrease':    minImpurityDecrease,
           'learning_rate':            0.01,
           'loss':                    'quantile',
           'alpha':                    alpha,
           'verbose':                  0
           }
model = ensemble.GradientBoostingRegressor( **params )
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )  

它运行良好。但是,对我来说,这条线

对我来说尚不清楚

model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )

返回:

1)无模型

2)停止之前训练的模型

3)模型25迭代(请注意监视器的参数)

如果不是(3),是否可以使估算值返回3?

我该怎么做?

值得一提的是XGBoost库做到了这一点,但是它确实允许使用我需要的损失功能。

模型在"停止规则"停止模型之前返回拟合 - 意味着您的答案第2号是正确的。

此"监视器代码"的问题是,最终所选的模型将是包括25个额外迭代的模型。选定的模型应该是您的第3号答案。

我认为,这样做的简单(愚蠢)方法是运行相同的模型(使用种子 - 具有相同的结果),但保持模型不等于(i -max_consecnoce_decreases)

/html>

相关内容

  • 没有找到相关文章

最新更新