小贝子编程

r-使用交叉验证时，是否可以确保每折至至少包含真实类的几个实例

本文关键字：真实包含几个实例验证确保是否 r logistic-regression
更新时间 : 2023-09-10
英文 : r - When using cross validation, is there a way to ensure each fold somehow contains at least several instances of the true class?

我正在使用caret使用交叉折叠验证：

library(caret)
## tuning & parameters
set.seed(123)
train_control <- trainControl(
  method = "cv",
  number = 5,
  savePredictions = TRUE,
  verboseIter = TRUE,
  classProbs = TRUE,
  summaryFunction = my_summary
)
linear_model = train(
  x = select(training_data, Avg_Load_Time),
  y = target,
  trControl = train_control,
  method = "glm", # logistic regression
  family = "binomial",
  metric = "ROC"
)

问题是，在〜5k行中，我只有〜120个真实情况。当使用GLM通过Caret" GlM.Fit：数字上拟合的概率为0或1）时，这是在发出警告消息。

我可以设置一个参数或确保每个折叠具有某些真实情况的方法吗？

当您洗牌数据并有足够的每个类示例时，更容易。

如果您没有足够的示例，则可以使用SMOTE（合成少数民族过采样技术）增加少数群体的规模。r。

中的 smotefamily包

然后，您将可以进行5或10倍的交叉验证而无需提出任何问题。

r-使用交叉验证时，是否可以确保每折至至少包含真实类的几个实例

相关内容

最新更新

热门标签：