我想使用分层采样创建一个训练和测试样本集。我试着四处寻找,但我找到的所有包都返回了一个数据帧,而不是一个表达式。我用来构建树的树包要求将子集作为表达式给出。
示例代码:
library(tree)
library(ISLR)
library(dplyr)
Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))
set.seed(2)
train_sample <- sample(nrow(Carseats), nrow(Carseats) * 0.7)
carseats_test <- Carseats[-train_sample,]
tree.carseats <- tree(High~ . -Sales, Carseats, subset = train_sample)
是否可以修改上述代码,以便使用分层进行采样?
你可以做:
library(tree)
library(ISLR)
library(dplyr)
Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))
mean(Carseats$High == "Yes")
[1] 0.41
train_sample <- Carseats %>%
tibble::rownames_to_column() %>%
group_by(High) %>%
sample_n(0.7*n()) %>%
mutate(rowname = as.numeric(rowname)) %>%
pull(rowname)
carseats_test <- Carseats[-train_sample,]
mean(carseats_test$High == "Yes")
[1] 0.4132231
tree.carseats <- tree(High~ . -Sales, Carseats, subset = train_sample)