XGBRegressor比梯度BoostingRegressor慢得多



我是xgboost的新手,并且正在尝试通过将其与传统gbm进行比较来学习如何使用它。但是,我注意到xgboostgbm慢得多。示例是:

from sklearn.model_selection import KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from sklearn.datasets import load_boston
import time
boston = load_boston()
X = boston.data
y = boston.target
kf = KFold(n_splits = 5)
cv_params = {'cv': kf, 'scoring': 'r2', 'n_jobs': 4, 'verbose': 1}
gbm = GradientBoostingRegressor()
xgb = XGBRegressor()
grid = {'n_estimators': [100, 300, 500], 'max_depth': [3, 5]}
timer = time.time()
gbm_cv = GridSearchCV(gbm, param_grid = grid, **cv_params).fit(X, y)
print('GBM time: ', time.time() - timer)
timer = time.time()
xgb_cv = GridSearchCV(xgb, param_grid = grid, **cv_params).fit(X, y)
print('XGB time: ', time.time() - timer)

在带有8个内核的MacBook Pro上,输出为:

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=4)]: Done  30 out of  30 | elapsed:    1.9s finished
GBM time:  2.262791872024536
Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=4)]: Done  30 out of  30 | elapsed:   16.4s finished
XGB time:  17.902266025543213

我认为Xgboost应该更快得多,所以我必须做错事。有人可以帮助指出我在做什么吗?

这是在我的计算机上运行时的输出,而无需设置 n_jobs参数 cv_params

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    4.1s finished
('GBM time: ', 4.248916864395142)
Fitting 5 folds for each of 6 candidates, totalling 30 fits
('XGB time: ', 2.934467077255249)
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    2.9s finished

n_jobs设置为4时,对于GBM来说,输出为 2.5s ,但对于XGB来说需要很长时间。

所以也许这是n_jobs的问题!也许XGBoost库配置不太适合使用GridSearchCV运行n_job。

相关内容

  • 没有找到相关文章

最新更新