我想使用r中的Information包计算Information Value(IV(和WoE。我使用以下代码-
library(Information)
df_iv <- df
IV <- create_infotables(data=df_iv[, -1],
y = "stroke",
bins=10)
最初,我的因变量是一个因素,我得到了以下错误-
Error in CheckInputs(data, valid, trt, y) :
ERROR: the dependent variable stroke is a factor in training dataset -- has to be numeric
然后我把因变量改为数字,如下所示-
library(Information)
df_iv <- df
df_iv$stroke <- as.numeric(df_iv$stroke)
IV <- create_infotables(data=df_iv[, -1],
y = "stroke",
bins=10)
现在我得到以下错误-
Error in CheckInputs(data, valid, trt, y) :
ERROR: the dependent variable has to be binary. Check your training and validation datasets.
我的因变量只有两个值"0";0";以及";1〃;。为什么会发生这种事?
默认情况下,将因子转换为数字会将因子级别转换为从1开始的整数-因此,如果您的依赖项最初为0/1,则最终为1/2,这将引发错误。
一个简单的解决方法是从数值变量的值中减去1。
例如
library(Information)
# succeeds, 0/1 by default
create_infotables(data= mtcars,y='vs')
# fails, now 1/2 numeric
mtcars$vs_converted <- as.numeric(as.factor(mtcars$vs))
create_infotables(data= mtcars,y='vs_converted')
# succeeds, now 0/1 binary
mtcars$vs_converted <- as.numeric(as.factor(mtcars$vs))-1
create_infotables(data= mtcars,y='vs_converted')