我正在使用此代码和包splitTools:
library(splitTools)
set.seed(3451)
inds <- partition(iris$Sepal.Length, p = c(train = 0.8, test = 0.2))
train <- iris[inds$train,]
test <- iris[inds$test,]
folds <- create_folds(train$Sepal.Length, k = 5)
返回的对象folds是一个整数列表。是否可以将列折叠添加到包含折叠编号的数据帧列(在这种情况下为1、2、3、4或5(?谢谢
PS:
惨痛的尝试:
results <- NULL
index <- 1
for (fold in folds) {
t <- train[-fold,]
t$fold <- index
index <- index + 1
results <- rbind(results, t)
}
table(results$fold)
train <- results
head(train)
如果您有兴趣获得"折叠外";每行分区,通过partition
更容易。create_folds
本身调用partition
,所以这样做不会丢失任何逻辑:
iris$fold <- partition(iris$Sepal.Length, p = rep(0.2, 5), split_into_list = FALSE)
# Gives
Sepal.Length Sepal.Width Petal.Length Petal.Width Species fold
1 5.1 3.5 1.4 0.2 setosa 4
2 4.9 3.0 1.4 0.2 setosa 2
3 4.7 3.2 1.3 0.2 setosa 4
4 4.6 3.1 1.5 0.2 setosa 5
5 5.0 3.6 1.4 0.2 setosa 4
6 5.4 3.9 1.7 0.4 setosa 3
>
免责声明:我是splitTools的作者,非常感谢关于如何改进包的提示:-(。
您需要将折叠转换为数据帧,然后为行创建索引,并根据折叠将值添加为TRUE/FALSE。这里的代码:
library(dplyr)
#Bind
L <- lapply(folds, function(x) data.frame(val=x))
dffolds <- do.call(rbind,L)
dffolds$Fold <- gsub('\..*','',rownames(dffolds))
rownames(dffolds)<-NULL
#Reshape
Folds <- dffolds %>% group_by(Fold) %>%
mutate(V=T) %>%
pivot_wider(names_from = Fold,values_from=V,values_fill=F)
#Merge
train2 <- train %>%
mutate(val=row_number()) %>%
left_join(Folds) %>%
select(-val)
输出(某些行和列(:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Fold1 Fold2 Fold3
1 4.9 3.0 1.4 0.2 setosa TRUE FALSE TRUE
2 4.7 3.2 1.3 0.2 setosa FALSE TRUE TRUE
3 5.0 3.6 1.4 0.2 setosa TRUE TRUE FALSE
4 5.4 3.9 1.7 0.4 setosa FALSE TRUE TRUE
5 4.6 3.4 1.4 0.3 setosa TRUE FALSE TRUE
6 5.0 3.4 1.5 0.2 setosa TRUE FALSE TRUE
7 4.9 3.1 1.5 0.1 setosa FALSE TRUE TRUE
8 5.4 3.7 1.5 0.2 setosa TRUE FALSE TRUE
9 4.8 3.0 1.4 0.1 setosa TRUE TRUE FALSE
10 4.3 3.0 1.1 0.1 setosa TRUE FALSE TRUE