使用splitTools将折叠列附加到训练数据帧



我正在使用此代码和包splitTools:

library(splitTools)
set.seed(3451)
inds <- partition(iris$Sepal.Length, p = c(train = 0.8, test = 0.2))
train <- iris[inds$train,]
test <- iris[inds$test,]
folds <- create_folds(train$Sepal.Length, k = 5)

返回的对象folds是一个整数列表。是否可以将列折叠添加到包含折叠编号的数据帧列(在这种情况下为1、2、3、4或5(?谢谢

PS:

惨痛的尝试:

results <- NULL
index <- 1
for (fold in folds) {
t <- train[-fold,]
t$fold <- index
index <- index + 1
results <- rbind(results, t)
}
table(results$fold)
train <- results
head(train)

如果您有兴趣获得"折叠外";每行分区,通过partition更容易。create_folds本身调用partition,所以这样做不会丢失任何逻辑:

iris$fold <- partition(iris$Sepal.Length, p = rep(0.2, 5), split_into_list = FALSE)
# Gives
Sepal.Length Sepal.Width Petal.Length Petal.Width Species fold
1          5.1         3.5          1.4         0.2  setosa    4
2          4.9         3.0          1.4         0.2  setosa    2
3          4.7         3.2          1.3         0.2  setosa    4
4          4.6         3.1          1.5         0.2  setosa    5
5          5.0         3.6          1.4         0.2  setosa    4
6          5.4         3.9          1.7         0.4  setosa    3
> 

免责声明:我是splitTools的作者,非常感谢关于如何改进包的提示:-(。

您需要将折叠转换为数据帧,然后为行创建索引,并根据折叠将值添加为TRUE/FALSE。这里的代码:

library(dplyr)
#Bind
L <- lapply(folds, function(x) data.frame(val=x))
dffolds <- do.call(rbind,L)
dffolds$Fold <- gsub('\..*','',rownames(dffolds))
rownames(dffolds)<-NULL
#Reshape
Folds <- dffolds %>% group_by(Fold) %>%
mutate(V=T) %>%
pivot_wider(names_from = Fold,values_from=V,values_fill=F)
#Merge
train2 <- train %>%
mutate(val=row_number()) %>%
left_join(Folds) %>%
select(-val)

输出(某些行和列(:

Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Fold1 Fold2 Fold3
1            4.9         3.0          1.4         0.2     setosa  TRUE FALSE  TRUE
2            4.7         3.2          1.3         0.2     setosa FALSE  TRUE  TRUE
3            5.0         3.6          1.4         0.2     setosa  TRUE  TRUE FALSE
4            5.4         3.9          1.7         0.4     setosa FALSE  TRUE  TRUE
5            4.6         3.4          1.4         0.3     setosa  TRUE FALSE  TRUE
6            5.0         3.4          1.5         0.2     setosa  TRUE FALSE  TRUE
7            4.9         3.1          1.5         0.1     setosa FALSE  TRUE  TRUE
8            5.4         3.7          1.5         0.2     setosa  TRUE FALSE  TRUE
9            4.8         3.0          1.4         0.1     setosa  TRUE  TRUE FALSE
10           4.3         3.0          1.1         0.1     setosa  TRUE FALSE  TRUE

最新更新