Matlab中表上的K-fold交叉验证

我有一个Matlab表，其中包含关于学生的信息(数字和分类(。这里给出了一个样本：

School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School    Age    Relationship    Status    Father_Job    Health    Exam
______    ___    ____________    ______    __________    ______    ____
GB      14         yes           ft         pol          1       pass
UR      14         yes           pt         ser          2       pass
GB      12         no            ft         oth          3       fail
GB      16         no            ft         ele          3       fail
UR      19         yes           ft         cle          5       pass

我想用这些数据来预测和分类考试的通过/不通过。我计划用fitglm做逻辑回归，用fitcnb做Naive Bayes分类器。我知道这两种方法都可以在Matlab中很好地处理分类变量，所以使用我的表和它的分类变量应该没有问题。

但是，当我想使用cvpartition和crossvalind执行10倍交叉验证时，我遇到了一个问题。当我尝试创建折叠的索引时，我会遇到以下错误：使用statslib.internal.grp2idx时出错不支持使用线性索引(一个下标(或多维索引(三个或多个下标(订阅表。使用行下标和变量下标。

我的目标是执行以下操作：

% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i); 
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')

有人能介绍一下吗？我知道我可以把分类变量改成数字变量，但我不愿意。这附近有什么吗？非常感谢。

我认为您的主要问题是数据集太小。您有n=5，这甚至不足以创建一个未验证的模型。

相关内容

最新更新

热门标签：