在使用 mlr 和使用 R 中的 rpart 和 mboost 等其他包时,如何对不同的结果进行插板



我正在使用 mlr 和其他软件包进行生存分析。在 mlr 中,我使用 surv.rpart 和 surv.glmboost。我也使用原始软件包 rpart 和 mboost 来做到这一点。我发现他们的结果是不同的。下面是一个示例:

> myData2 <- data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2),
DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19),
Status='TRUE')
> myData2$Status <- as.logical(myData2$Status)
> myTrain <- c(1:(nrow(myData2)-1))
> myTest <- nrow(myData2)

当我在 mlr 中使用 surv.rpart 时,结果是:

> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
> surv.lrn <- makeLearner("surv.rpart")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold: 
time: 0.00
id truth.time truth.event response
11 11         19        TRUE        1

如果我使用原始的 rpart 包,结果是:

> train <- myData2[1:(nrow(myData2)-1),]
> test <- myData2[nrow(myData2),]
> fit <- rpart(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[1] 26.9

为什么我得到了两个不同的结果?看起来 rpart 包直接给了我想要的结果,而 mlr 的结果有某种转换。当我使用 surv.glmboost 时也会发生同样的事情:

> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
Warning messages:
1: Unknown or uninitialised column: 'Weibull'. 
2: Unknown or uninitialised column: 'Cox'. 
3: Unknown or uninitialised column: 'Month2'. 
4: Unknown or uninitialised column: 'Month2'. 
5: Unknown or uninitialised column: 'Month'. 
6: Unknown or uninitialised column: 'Month'. 
7: Unknown or uninitialised column: 'MonthsDiff'. 
8: Unknown or uninitialised column: 'Weibull'. 
9: Unknown or uninitialised column: 'Cox'. 
> surv.lrn <- makeLearner("surv.glmboost")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
Warning message:
In names(data) != all.vars(formula[[2]]) :
longer object length is not a multiple of shorter object length
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold: 
time: 0.00
id truth.time truth.event   response
11 11         19        TRUE -0.1946239

以下是使用 mboost 包的结果:

> train <- myData2[1:(nrow(myData2)-1),]
Warning messages:
1: Unknown or uninitialised column: 'Weibull'. 
2: Unknown or uninitialised column: 'Cox'. 
3: Unknown or uninitialised column: 'Month2'. 
4: Unknown or uninitialised column: 'Month2'. 
5: Unknown or uninitialised column: 'Month'. 
6: Unknown or uninitialised column: 'Month'. 
7: Unknown or uninitialised column: 'MonthsDiff'. 
8: Unknown or uninitialised column: 'Weibull'. 
9: Unknown or uninitialised column: 'Cox'. 
> test <- myData2[nrow(myData2),]
> fit <- glmboost(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[,1]
[1,] 33.08294

这是我到目前为止发现的。这可能发生在其他函数上,如surv.cforest。我的问题是:为什么会这样?以及使用 mlr 包时如何获得像 rpart 和 mboost 这样的结果?

你的问题是,你没有用rpart和glmboost拟合生存模型,而是一个简单的回归模型。

在 rpart 中拟合生存模型如下所示:

fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp")
predict(fit,newdata=test)

因此,完整的比较代码给出相同的结果(每个结果预测 1(:

library(mlr)
myData2 = data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2),
DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19),
Status='TRUE')
myData2$Status = as.logical(myData2$Status)
train = myData2[1:(nrow(myData2)-1),]
test = myData2[nrow(myData2),]
surv.task = makeSurvTask(data=train,target=c('DaysDiff','Status'))
surv.lrn = makeLearner("surv.rpart")
mod = train(learner=surv.lrn,task=surv.task,subset=myTrain)
surv.pred = predict(mod,newdata = test)
surv.pred
library(rpart)
library(survival)
fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp")
predict(fit,newdata=test)

相关内容

  • 没有找到相关文章

最新更新