XGBoost 随机给出"0.5"的静态预测



我正在使用带有XGBRgressor的scikit学习管道。管道运行良好,没有任何错误。当我用这个管道进行预测时,我会多次预测相同的数据,有时预测结果是0.5,而正常预测范围是(1000-1000(

例如:(1258.21258.21258.2

  • 输入数据完全相同
  • 环境与相同

    numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())])
    categorical_transformer = Pipeline(steps=[
    ('imputer',
    SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])
    numeric_features = X.select_dtypes(
    include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(
    include=['object']).columns
    preprocessor = ColumnTransformer(
    transformers=[
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)])
    # Number of trees
    n_estimators = [int(x) for x in
    np.linspace(start=50, stop=1000, num=10)]
    # Maximum number of levels in tree
    max_depth = [int(x) for x in np.linspace(1, 32, 32, endpoint=True)]
    # Booster
    booster = ['gbtree', 'gblinear', 'dart']
    # selecting gamma
    gamma = [i / 10.0 for i in range(0, 5)]
    # Learning rate
    learning_rate = np.linspace(0.01, 0.2, 15)
    # Evaluation metric
    #         eval_metric = ['rmse','mae']
    # regularization
    reg_alpha = [1e-5, 1e-2, 0.1, 1, 100]
    reg_lambda = [1e-5, 1e-2, 0.1, 1, 100]
    # Min chile weight
    min_child_weight = list(range(1, 6, 2))
    # Samples
    subsample = [i / 10.0 for i in range(6, 10)]
    colsample_bytree = [i / 10.0 for i in range(6, 10)]
    # Create the random grid
    random_grid = {'n_estimators': n_estimators,
    'max_depth': max_depth,
    'booster': booster,
    'gamma': gamma,
    'learning_rate': learning_rate,
    #                        'eval_metric' : eval_metric,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree
    }
    # Use the random grid to search for best hyperparameters
    # First create the base model to tune
    rf = xgboost.XGBRegressor(objective='reg:squarederror', n_jobs=4)
    # Random search of parameters, using 3 fold cross validation,
    # search across 100 different combinations, and use all available cores
    rf_random = RandomizedSearchCV(estimator=rf,
    param_distributions=random_grid,
    n_iter=100,
    cv=3,
    verbose=0,
    random_state=42,
    n_jobs=4)
    pipe = Pipeline(steps=[('preprocessor', preprocessor),
    ('regressor', rf_random)])
    pipe.fit(X, y)
    

可能是什么问题?

如果你得到一些异常低的预测,这可能表明因变量有异常值。我建议你阅读它,以及解决这个问题的不同策略或建议。

通常,在不去除异常值的情况下考虑模型的所有数据样本不是一个好主意。这将导致更糟糕和不具代表性的指标。

这可能是因为你的目标中有Nans或None

最新更新