如何获取 XGBoost 回归器的单个树的预测值?



我已经通过阅读如何获得每个单独的树';xgboost中的s预测?

model = XGBRegressor(n_estimators=1000)
model.fit(X_train, y_train)
booster_ = model.get_booster()
individual_preds = []
for tree_ in booster_:
individual_preds.append(
tree_.predict(xgb.DMatrix(X_test)),
)
individual_preds = np.vstack(individual_preds)

单个树的结果与使用booster_.predict(xgb.DMatrix(X_test))(以0.5为中心(的结果相距甚远。如何获得XGBoost回归器的单个树的预测值?如何使它们与综合预测具有可比性?

从xgboost api来看,iteration_range似乎适合这个请求,如果理解问题ok:

iteration_range(元组[int,int](–

指定在预测中使用哪一层树。例如,如果一个随机森林被训练了100轮。指定iteration_range=(10,20(,则在该预测中仅使用在[10,20((半开集(回合期间建立的森林

例如,我使用加利福尼亚州的住房数据来训练XGB回归模型:

from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X_train, X_valid, y_train, y_valid = train_test_split(housing.data, housing.target, 
test_size = 0.33, random_state = 11)
dtrain = xgb.DMatrix(data=X_train, label=y_train)
dvalid= xgb.DMatrix(data=X_valid, label=y_valid, feature_names=list(housing.feature_names))
# define model and train
params_reg = {"max_depth":4, "eta":0.3, "objective":"reg:squarederror", "subsample":1}
xgb_model_reg = xgb.train(params=params_reg, dtrain=dtrain, num_boost_round=100, 
early_stopping_rounds=20,evals=[(dtrain, "train")])
# predict
y_pred = xgb_model_reg.predict(dvalid)

对于随机行500的预测是1.9630624。我使用下面的iteration_range来包括一个用于预测的树,然后显示针对每个树索引的预测结果:

for tree in range(0,100):
print(a,xgb_model_reg.predict(dvalid,iteration_range=(tree,tree+1))[500])

以下是输出摘录:

0 0.9880972
1 0.5706124
2 0.59768033
3 0.51785016
4 0.58512527
5 0.5990092
6 0.6660166
7 0.46186835
8 0.5213114
9 0.5857907
10 0.4683379
11 0.54352343
12 0.46028078
13 0.4823497
14 0.51296484
15 0.49818778
16 0.50080884
...
97 0.5000746
98 0.49949
99 0.5004089

我想我主要知道如何构建个人预测,将其总结为整体预测。

第一件事是关于base_score。根据https://xgboost.readthedocs.io/en/stable/parameter.html#learning-任务参数如果未设置CCD_ 5,则对其进行估计。而且很难得到并正确应用这一估计。因此,为了获得助推器的可预测行为,我建议将初始偏差明确设置为零。其次,所有这些带有sigmoid的变换只对分类器有效。后悔者根本不需要它们。

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import xgboost
housing = fetch_california_housing()
x_train, x_valid, y_train, y_valid = train_test_split(housing.data, housing.target, 
test_size = 0.33, random_state = 11)
reg = xgboost.XGBRegressor(n_estimators=7, base_score=0)  # explicitly set base_score to zero
reg.fit(x_train, y_train)
xm = xgboost.DMatrix(x_valid)
individual_preds = [booster.predict(xm) for booster in reg.get_booster()]
y = reg.predict(x_valid)
print(sum(individual_preds) – y)  # that sould output zero vector

最新更新