XGBoost:提前停止默认度量，而不是自定义评估功能

我使用的是XGBoost 0.90。我希望用Python训练XGBoost回归模型，使用内置的学习目标，并提前停止内置的评估指标。容易的在我的案例中，目标是"reg:tweedie"，评估指标是"tweedie-nloglik"。但在每次迭代中，我也希望计算一个信息丰富的自定义度量，该度量不应用于提前停止。但这是错误的。

最终，我希望使用scikit学习GridSearchCV，训练一组具有内置目标和指标的模型，以便尽早停止，但最终选择在自定义指标上效果最好的模型。

在这个示例代码中，我使用了另一个内置目标和内置度量，但问题是一样的。

import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
def mymetric(pred, dmat):
y = dmat.get_label()
res = np.sqrt(np.sum((y - pred)**4)/len(y))
return 'mymetric', float(res)
np.random.seed(seed=2500)
x, y, weight = np.random.randn(4096, 16), np.random.randn(4096), np.random.random(4096)
train_x, test_x, train_y, test_y, train_weight, test_weight = train_test_split(x, y, weight, 
         train_size=0.7, random_state=32)
dtrain = xgb.DMatrix(train_x, label=train_y, weight=train_weight)
dtest = xgb.DMatrix(test_x, label=test_y, weight=test_weight)
results_learning = {}
bst = xgb.train(params={'objective': 'reg:squarederror', 
'eval_metric': 'rmse', 
'disable_default_eval_metric': 0},
num_boost_round=20, dtrain=dtrain, evals=[(dtrain, 'dtrain'), (dtest, 'dtest')],
evals_result=results_learning,
feval=mymetric,
early_stopping_rounds=3)

输出是(如果我没有使用feval，它会在迭代3停止(：

[0] dtrain-rmse:1.02988 dtest-rmse:1.11216  dtrain-mymetric:1.85777 dtest-mymetric:2.15138
Multiple eval metrics have been passed: 'dtest-mymetric' will be used for early stopping.
Will train until dtest-mymetric hasn't improved in 3 rounds.
...
Stopping. Best iteration:
[4] dtrain-rmse:0.919674    dtest-rmse:1.08358  dtrain-mymetric:1.56446 dtest-mymetric:1.9885

我怎么能得到这样的输出？

[0] dtrain-rmse:1.02988 dtest-rmse:1.11216  dtrain-mymetric:1.85777 dtest-mymetric:2.15138
Multiple eval metrics have been passed: 'dtest-rmse' will be used for early stopping.
Will train until dtest-rmse hasn't improved in 3 rounds.
...
Stopping. Best iteration:
[3] dtrain-rmse:0.941712    dtest-rmse:1.0821   dtrain-mymetric:1.61367 dtest-mymetric:1.99428

我本可以用一个返回元组列表的自定义求值函数来解决这个问题(https://github.com/dmlc/xgboost/issues/1125)。但是，当我希望使用诸如"rmse"或"tweedie nloglik"之类的内置评估指标时，这能做到吗？我可以在自定义评估函数中调用它们吗？

XGBoost中有一个内置的提前停止回调函数，可以指定要用于提前停止的数据集和度量。在您的情况下，您必须创建一个新的提前停止回调，如下所示：

early_stop = xgb.callback.EarlyStopping(rounds=3,
metric_name='rmse',
data_name='dtest')

然后当你呼叫火车时将其添加到回调列表中：

bst = xgb.train(params={'objective': 'reg:squarederror', 
'eval_metric': 'rmse', 
'disable_default_eval_metric': 0},
num_boost_round=20, dtrain=dtrain, evals=[(dtrain, 'dtrain'), (dtest, 'dtest')],
evals_result=results_learning,
feval=mymetric,
callbacks=[early_stop])

有关详细信息，请参阅文档的此页：https://xgboost.readthedocs.io/en/latest/python/callbacks.html

相关内容

最新更新

热门标签：