我正在尝试执行如下函数来平衡带有包 ROSE 的训练集:
library(ROSE)
rose <- function(df){
str(df)
set.seed(124)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
return(train.rose)
}
data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset
rose(df)
调用上述脚本会生成以下错误消息:
Error in terms.formula(formula, data = frml.env) :
'data' argument is of the wrong type
相反,当我在本地函数rose
之外调用ovun.sample(...)
函数时,一切都很好,即:
library(ROSE)
data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset
str(df)
set.seed(124)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
我知道在rose((中调用函数ovun.sample(..., data=train,...)
时会出现问题,但我无法弄清楚原因。可能是环境变量的问题吗?
知道吗?
我在没有set.seed(1234)
的情况下执行了代码,它对我有用,你应该从函数中设置一个种子。另外,也许您激活了一些库,这会导致R
混淆。
rose <- function(df){
str(df)
intrain <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[intrain,]
train.rose <- ovun.sample(cls ~ ., data=train, N=nrow(train), p=0.5, seed=1, method="both")$data
return(train.rose)
}
data(hacide)
df <- rbind(hacide.train, hacide.test) # just to simulate a complete dataset
set.seed(1234)
head(rose(df))
'data.frame': 1250 obs. of 3 variables:
$ cls: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ x1 : num 0.2008 0.0166 0.2287 0.1264 0.6008 ...
$ x2 : num 0.678 1.5766 -0.5595 -0.0938 -0.2984 ...
cls x1 x2
1 0 -0.2247632 0.6806409
2 0 0.3437585 -1.0202996
3 0 -1.0226182 1.9629034
4 0 0.7245372 -0.2494658
5 0 -0.8972314 0.2397664
6 0 0.3361091 -0.2661655
此外,出现的str
引用原始df
而不是转换。