的新手,所以如果这里的答案很明显,我深表歉意,但我正在尝试获取熊猫数据帧并在Python中获取XGBoost,以给我与使用Scikit-Learn包装器进行相同练习时得到的相同预测。 到目前为止,我无法这样做。 举个例子,这里我采用波士顿数据集,转换为熊猫数据帧,训练数据集的前 500 个观测值,然后预测最后 6 个。 我首先使用XGBoost,然后使用Scikit-Learn包装器进行操作,即使我将模型的参数设置为相同,我也得到了不同的预测。 具体来说,数组预测看起来与数组预测2非常不同(请参阅下面的代码)。 任何帮助将不胜感激!
from sklearn import datasets
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from xgboost.sklearn import XGBRegressor
### Use the boston data as an example, train on first 500, predict last 6
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)
#### Code using XGBoost
Sub_train = df_boston.head(500)
target = Sub_train["target"]
Sub_train = Sub_train.drop('target', axis=1)
Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)
xgtrain = xgb.DMatrix(Sub_train.as_matrix(), label=target.tolist())
xgtest = xgb.DMatrix(Sub_predict.as_matrix())
params = {'booster': 'gblinear', 'objective': 'reg:linear',
'max_depth': 2, 'learning_rate': .1, 'n_estimators': 500, 'min_child_weight': 3, 'colsample_bytree': .7,
'subsample': .8, 'gamma': 0, 'reg_alpha': 1}
model = xgb.train(dtrain=xgtrain, params=params)
predictions = model.predict(xgtest)
#### Code using Sk learn Wrapper for XGBoost
model = XGBRegressor(learning_rate =.1, n_estimators=500,
max_depth=2, min_child_weight=3, gamma=0,
subsample=.8, colsample_bytree=.7, reg_alpha=1,
objective= 'reg:linear')
target = "target"
Sub_train = df_boston.head(500)
Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)
Ex_List = ['target']
predictors = [i for i in Sub_train.columns if i not in Ex_List]
model = model.fit(Sub_train[predictors],Sub_train[target])
predictions2 = model.predict(Sub_predict)
请看这个答案
这里
xgboost.train
将忽略参数n_estimators,而xgboost.XGBRegressor
接受。在 xgboost.train 中,提升迭代 (即n_estimators)由num_boost_round控制(默认:10)
它建议从提供给xgb.train
的参数中删除n_estimators
并将其替换为num_boost_round
。
所以像这样更改你的参数:
params = {'objective': 'reg:linear',
'max_depth': 2, 'learning_rate': .1,
'min_child_weight': 3, 'colsample_bytree': .7,
'subsample': .8, 'gamma': 0, 'alpha': 1}
并像这样训练xgb.train:
model = xgb.train(dtrain=xgtrain, params=params,num_boost_round=500)
你会得到相同的结果。
或者,保持 xgb.train 原样,并像这样更改 XGBRegressor:
model = XGBRegressor(learning_rate =.1, n_estimators=10,
max_depth=2, min_child_weight=3, gamma=0,
subsample=.8, colsample_bytree=.7, reg_alpha=1,
objective= 'reg:linear')
然后你也会得到相同的结果。