如何在没有滑雪套件学习的情况下创建K-Fold交叉验证训练集

我有一个数据集，它有95行9列，想要进行5倍的交叉验证。在训练中，前8列(特征(用于预测第九列。我的测试集是正确的，但当我的x训练集应该只有8列时，它的大小是(4,19,9(，而当它应该有19行时，我的y训练集是(4,9(。我是否错误地索引了子数组？

kdata = data[0:95,:] # Need total rows to be divisible by 5, so ignore last 2 rows 
np.random.shuffle(kdata) # Shuffle all rows
folds = np.array_split(kdata, k) # each fold is 19 rows x 9 columns
for i in range (k-1):
xtest = folds[i][:,0:7] # Set ith fold to be test
ytest = folds[i][:,8]
new_folds = np.delete(folds,i,0)
xtrain = new_folds[:][:][0:7] # training set is all folds, all rows x 8 cols
ytrain = new_folds[:][:][8]   # training y is all folds, all rows x 1 col

欢迎使用堆栈溢出。

创建新折叠后，需要使用np.row_stack()逐行堆叠它们。

此外，我认为您对数组进行了错误的切片，在Python或Numpy中，切片行为是[inclusive:exclusive]，因此，当您将切片指定为[0:7]时，您只需要7列，而不是您想要的8个特征列。

类似地，如果您在for循环中指定5倍，则应该是range(k)，它将为您提供[0,1,2,3,4]，而不是range(k-1)，它只为您提供了[0,1,2,3]。

修改后的代码：

folds = np.array_split(kdata, k) # each fold is 19 rows x 9 columns
np.random.shuffle(kdata) # Shuffle all rows
folds = np.array_split(kdata, k)
for i in range (k):
xtest = folds[i][:,:8] # Set ith fold to be test
ytest = folds[i][:,8]
new_folds = np.row_stack(np.delete(folds,i,0))
xtrain = new_folds[:, :8]
ytrain = new_folds[:,8]
# some print functions to help you debug
print(f'Fold {i}')
print(f'xtest shape  : {xtest.shape}')
print(f'ytest shape  : {ytest.shape}')
print(f'xtrain shape : {xtrain.shape}')
print(f'ytrain shape : {ytrain.shape}n')

它将为您打印出训练和测试集的折叠和所需形状：

Fold 0
xtest shape  : (19, 8)
ytest shape  : (19,)
xtrain shape : (76, 8)
ytrain shape : (76,)
Fold 1
xtest shape  : (19, 8)
ytest shape  : (19,)
xtrain shape : (76, 8)
ytrain shape : (76,)
Fold 2
xtest shape  : (19, 8)
ytest shape  : (19,)
xtrain shape : (76, 8)
ytrain shape : (76,)
Fold 3
xtest shape  : (19, 8)
ytest shape  : (19,)
xtrain shape : (76, 8)
ytrain shape : (76,)
Fold 4
xtest shape  : (19, 8)
ytest shape  : (19,)
xtrain shape : (76, 8)
ytrain shape : (76,)

相关内容

最新更新

热门标签：