r-使用分层采样来分割用于决策树学习的数据帧



我想使用分层采样创建一个训练和测试样本集。我试着四处寻找,但我找到的所有包都返回了一个数据帧,而不是一个表达式。我用来构建树的树包要求将子集作为表达式给出。

示例代码:

library(tree)
library(ISLR)
library(dplyr)
Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))
set.seed(2)
train_sample <- sample(nrow(Carseats), nrow(Carseats) * 0.7)
carseats_test <- Carseats[-train_sample,]
tree.carseats <- tree(High~ . -Sales, Carseats, subset = train_sample)

是否可以修改上述代码,以便使用分层进行采样?

你可以做:

library(tree)
library(ISLR)
library(dplyr)
Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))
mean(Carseats$High == "Yes")
[1] 0.41
train_sample <- Carseats %>%
tibble::rownames_to_column() %>% 
group_by(High) %>%
sample_n(0.7*n()) %>%
mutate(rowname = as.numeric(rowname)) %>%
pull(rowname) 
carseats_test <- Carseats[-train_sample,]
mean(carseats_test$High == "Yes")
[1] 0.4132231
tree.carseats <- tree(High~ . -Sales, Carseats, subset = train_sample)

相关内容

  • 没有找到相关文章

最新更新