r-使用mlr预测计数

我正在使用学习器regr.gbm来预测计数。在mlr之外，直接使用gbm包，我使用distribution = "poisson"和predict.gbm，使用type = "response"，返回原始规模的预测，但我注意到，当我使用mlr这样做时，预测似乎在对数规模上：

truth    response
913      4  0.67348708
914      1  0.28413256
915      3  0.41871237
916      1  0.13027792
2101     1 -0.02092168
2102     2  0.23394970

然而，"真相"并不在对数范围内，因此我担心mlr中的超参数调整例程将不起作用。为了进行比较，这是我使用distribution = "gaussian"得到的输出。

truth response
913      4 2.028177
914      1 1.334658
915      3 1.552846
916      1 1.153072
2101     1 1.006362
2102     2 1.281811

处理这个问题的最佳方法是什么？

之所以会发生这种情况，是因为默认情况下gbm会在链接函数规模上进行预测(distribution = "poisson"的预测为log(。这是由gbm::predict.gbm的type参数控制的(请参阅该函数的帮助页(。不幸的是，mlr默认情况下不提供更改此参数(已在mlr错误跟踪器中报告(。目前的解决方法是手动添加此参数：

lrn <- makeLearner("regr.gbm", distribution = "poisson")
lrn$par.set <- c(lrn$par.set,
makeParamSet(
makeDiscreteLearnerParam("type", c("link", "response"),
default = "link", when = "predict", tunable = FALSE)))
lrn <- setHyperPars(lrn, type = "response")
# show that it works:
counttask <- makeRegrTask("counttask", getTaskData(pid.task),
target = "pregnant")
pred <- predict(train(lrn, counttask), counttask)
pred

请注意，在调整计数数据的参数时，默认回归度量(误差平方平均值(可能会过度强调对具有大计数值的数据点的拟合。预测"10"而不是"1"的平方误差与预测"1010"而不是预测"1001"的误差相同，但根据您的目标，您可能希望在本例中对第一个误差给予更多重视。

一个可能的解决方案是使用(归一化(平均泊松对数似然作为度量：

poisllmeasure = makeMeasure(
id = "poissonllnorm",
minimize = FALSE,
best = 0,
worst = -Inf,
properties = "regr",
name = "Mean Poisson Log Likelihood",
note = "For count data. Normalized to 0 for perfect fit.",
fun = function(task, model, pred, feats, extra.args) {
mean(dpois(pred$data$truth, pred$data$response, log = TRUE) -
dpois(pred$data$truth, pred$data$truth, log = TRUE))
})
# example
performance(pred, poisllmeasure)

该度量可通过将其赋予tuneParams()中的measures参数来用于调谐。(注意，您必须在列表中给出：tuneParams(... measures = list(poisllmeasure) ...)(

相关内容

最新更新

热门标签：