KeyError:手动将训练/测试集拆分为两个文件时发生'[...] not in index'



我收到错误KeyError: '[...]在我的数据集上使用 sklearn 超选择回归示例时不在索引中。

我已经看到了这个问题的其他答案,其中的解决方案是,例如,X_train应该设置为 X_train = X.iloc[train_indices],而缺乏 iloc 使用是问题所在。但是在我的问题中,我已经手动将数据集拆分为两个文件,因此我不需要进行任何切片或索引。我使用不同的脚本获取大数据集并将其拆分为训练集文件和测试集文件。这些文件没有索引列,只有数字。如果你想知道数据集,它来自UCI,称为蛋白质理化数据集。

from hpsklearn import HyperoptEstimator, any_regressor, xgboost_regression
from sklearn.datasets import load_iris
from hyperopt import tpe
import numpy as np
import pandas as pd
# Download the data and split into training and test sets
X_train = pd.read_csv('data2/CASP_train.csv')
X_test = pd.read_csv('data2/CASP_test.csv')
y_train = X_train['Y']
y_test = X_test['Y']
X_train.drop('Y',axis=1,inplace=True)
X_test.drop('Y',axis=1,inplace=True)
print(list(X_test))
#X_train.drop(list(X_train)[0],axis=1,inplace=True)
#X_test.drop(list(X_test)[0],axis=1,inplace=True)
print(list(X_test))
print(X_train)
# Instantiate a HyperoptEstimator with the search space and number of evaluations
estim = HyperoptEstimator(regressor=xgboost_regression('xgreg'),
preprocessing=('my_pre'),
algo=tpe.suggest,
max_evals=100,
trial_timeout=120)
estim.fit(X_train, y_train)
print(estim.score(X_test, y_test))
print(estim.best_model())

完整回溯如下

Traceback (most recent call last):
File "PRSAXGB.py", line 30, in <module>
estim.fit(X_train, y_train)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hpsklearn/estimator.py", line 783, in fit
fit_iter.send(increment)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hpsklearn/estimator.py", line 693, in fit_iter
return_argmin=False, # -- in case no success so far
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 389, in fmin
show_progressbar=show_progressbar,
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/base.py", line 643, in fmin
show_progressbar=show_progressbar)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 408, in fmin
rval.exhaust()
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 262, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 227, in run
self.serial_evaluate()
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 141, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hyperopt/base.py", line 848, in evaluate
rval = self.fn(pyll_rval)
File "/home/rj/anaconda3/lib/python3.6/site-packages/hpsklearn/estimator.py", line 656, in fn_with_timeout
raise fn_rval[1]
KeyError: '[    0     1     2 ... 29264 29265 29266] not in index'

解决方案是estim.fit(X_train.values, y_train.values)

最新更新