使用cross_validation之后。KFold(n, n_folds=folds) 我想访问索引来训练和测试单折,而不是遍历所有折叠。
因此,让我们以示例代码为例:
from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)
>>> print(kf)
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
random_state=None)
>>> for train_index, test_index in kf:
我想像这样访问 kf 中的第一个折叠(而不是 for 循环):
train_index, test_index in kf[0]
这应该只返回第一折,但我得到错误:"类型错误:'KFold' 对象不支持索引"
我想要的输出:
>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]
链接: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html
问题
如何检索索引以进行训练和仅测试单个折叠,而不遍历整个 for 循环?
你走在正确的轨道上。您现在需要做的就是:
kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]
kf
实际上是一个生成器,它不会计算训练-测试拆分,直到需要它。这提高了内存使用率,因为您不会存储不需要的项目。创建KFold
对象列表会强制它使所有值可用。
这里有两个很好的SO问题来解释什么是生成器:一和二
编辑2018年11月
自 sklearn 0.20 以来,API 已更改。更新的示例(对于 py3.6):
from sklearn.model_selection import KFold
import numpy as np
kf = KFold(n_splits=4)
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
X_train, X_test = next(kf.split(X))
In [12]: X_train
Out[12]: array([2, 3])
In [13]: X_test
Out[13]: array([0, 1])
# We saved all the K Fold samples in different list then we access to this throught [i]
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd
kf = KFold(n_splits=4)
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
Y = np.array([0,0,0,1])
Y=Y.reshape(4,1)
X=pd.DataFrame(X)
Y=pd.DataFrame(Y)
X_train_base=[]
X_test_base=[]
Y_train_base=[]
Y_test_base=[]
for train_index, test_index in kf.split(X):
X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:]
Y_train, Y_test = Y.iloc[train_index,:], Y.iloc[test_index,:]
X_train_base.append(X_train)
X_test_base.append(X_test)
Y_train_base.append(Y_train)
Y_test_base.append(Y_test)
print(X_train_base[0])
print(Y_train_base[0])
print(X_train_base[1])
print(Y_train_base[1])