如何解决XGboost分类器中的值错误:特征不匹配?



对于我的工作,我分割了数据,然后使用过采样(由于分布不平衡)和特征选择。我想使用分类器XGboost,但我得到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-16-ace98cb7898f> in <module>()
5 model.fit(X_train, y_train)
6 # make predictions for test data
----> 7 y_pred = model.predict(X_test)
8 predictions = [round(value) for value in y_pred]
9 # evaluate predictions
2 frames
/usr/local/lib/python3.7/dist-packages/xgboost/core.py in _validate_features(self, data)
1688 
1689                 raise ValueError(msg.format(self.feature_names,
-> 1690                                             data.feature_names))
1691 
1692     def get_split_value_histogram(self, feature, fmap='', bins=None, as_pandas=True):
ValueError: feature_names mismatch.

代码如下:

X_train, X_test, y_train, y_test = train_test_split(
features, label, test_size=0.50, random_state=42)
oversample = SMOTE()
X_train, y_train = oversample.fit_resample(X_train, y_train)
estimator = LogisticRegression()
selector = RFE(estimator, n_features_to_select=5, step=1)
selector = selector.fit(X_train, y_train)
model = XGBClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

我如何解决知道过采样和特征选择总是发生在分割数据后的错误?

您仅在列车数据中使用了特征选择器。这是功能不匹配的主要原因。您也可以通过将相同的实例应用于测试数据来匹配您的特性。

selector = RFE(estimator, n_features_to_select=5, step=1)
X_train = selector.fit_transform(X_train)
X_test = selector.transform(X_test)

最新更新