使用 KFold 拆分来拟合模型返回"Not in index"

我有一个这样的数据帧：

Col1    Col2    
10   1        6         
11   3        8        
12   9        4        
13   7        2
14   4        3
15   2        9
16   6        7
17   8        1
18   5        5

我想使用KFold交叉验证来拟合我的模型并进行预测。

for train_index, test_index in kf.split(X_train, y_train):
model.fit(X[train_index], y[train_index])
y_pred = model.predict(X[test_index])

此代码生成以下错误：

"[1 2 4 7]不在索引"中

我看到在KFold.split((之后，train_index和test_index不使用数据帧的实际索引号。

所以我不能适应我的模型。

有人有主意吗？

据我所见，数据帧的索引从10开始，而不是从0开始，正如您所说，从sklearn的拆分使用从0开始的索引。一种解决方案是用重置数据帧的索引

df = df.reset_index(drop=True)

另一个解决方案是在数据帧上使用.iloc，所以它看起来像(假设y是一个数组，如果它是一个数据帧，那么你也必须在那里使用.ioc(。

for train_index, test_index in kf.split(X_train, y_train):
model.fit(X.iloc[train_index], y[train_index])
y_pred = model.predict(X.iloc[test_index])

第三种解决方案是将数据帧转换为数组。

for train_index, test_index in kf.split(X_train, y_train):
model.fit(X.values[train_index], y[train_index])
y_pred = model.predict(X.values[test_index])

编辑：我甚至可以看到第4个解决方案，这可能是你想要的。您只需执行df.index.values[train_index]就可以获得训练集中的索引数组。

相关内容

最新更新

热门标签：