R -MLR软件包 - 尝试集成一个新的聚类学习者.par.vals中的默认值被忽略(在MakerLearnerClus



我正在尝试将软件包clusterr的minibatchkmeans函数集成到MLR。根据文档,我进行了以下更改:

  1. 创建Makerlearner.cluster.minibatchkmeans
  2. 创建的trainlearner.cluster.minibatchkmeans
  3. 创建的Precectlearner.cluster.minibatchkmeans
  4. 注册了上述S3方法(如下所述)

在这一点上,我能够创建学习者,并致电火车并预测他们。但是,当尝试创建学习者而没有提供任何"群集"的值时,就会发生问题。

基础软件包(在clusterr中)没有为参数"簇"定义的默认值。根据MLR方法,我尝试使用par.vals参数提供"群集"的默认值。但是,此默认参数被忽略。

我的代码:

#' @export
makeRLearner.cluster.MiniBatchKmeans = function() {
  makeRLearnerCluster(
    cl = "cluster.MiniBatchKmeans",
    package = "ClusterR",
    par.set = makeParamSet(
      makeIntegerLearnerParam(id = "clusters", lower = 1L),
      makeIntegerLearnerParam(id = "batch_size", default = 10L, lower = 1L),
      makeIntegerLearnerParam(id = "num_init", default = 1L, lower = 1L),
      makeIntegerLearnerParam(id = "max_iters", default = 100L, lower = 1L),
      makeNumericLearnerParam(id = "init_fraction", default = 1, lower = 0),
      makeDiscreteLearnerParam(id = "initializer", default = "kmeans++",
        values = c("optimal_init", "quantile_init", "kmeans++", "random")),
      makeIntegerLearnerParam(id = "early_stop_iter", default = 10L, lower = 1L),
      makeLogicalLearnerParam(id = "verbose", default = FALSE,
        tunable = FALSE),
      makeUntypedLearnerParam(id = "CENTROIDS", default = NULL),
      makeNumericLearnerParam(id = "tol", default = 1e-04, lower = 0),
      makeNumericLearnerParam(id = "tol_optimal_init", default = 0.3, lower = 0),
      makeIntegerLearnerParam(id = "seed", default = 1L)
    ),
    par.vals = list(clusters = 2L),
    properties = c("numerics", "prob"),
    name = "MiniBatchKmeans",
    note = "Note",
    short.name = "MBatchKmeans",
    callees = c("MiniBatchKmeans", "predict_MBatchKMeans")
  )
}
#' @export
trainLearner.cluster.MiniBatchKmeans = function(.learner, .task, .subset, .weights = NULL, ...) {
  ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...)
}
#' @export
predictLearner.cluster.MiniBatchKmeans = function(.learner, .model, .newdata, ...) {
  if (.learner$predict.type == "prob") {
    pred = ClusterR::predict_MBatchKMeans(data = .newdata,
      CENTROIDS = .model$learner.model$centroids,
      fuzzy = TRUE, ...)
    res = pred$fuzzy_clusters
    return(res)
  } else {
    pred = ClusterR::predict_MBatchKMeans(data = .newdata,
      CENTROIDS = .model$learner.model$centroids,
      fuzzy = FALSE, ...)
    res = as.integer(pred)
    return(res)
  }
}

问题(上面的视图中群集的默认值被忽略):

## When defining a value of clusters, it works as expected
lrn <- makeLearner("cluster.MiniBatchKmeans", clusters = 3L)
getLearnerParVals(lrn)
# The below commented lines are printed
# $clusters
# [1] 3
## When not providing a value for clusters, default is not used
lrn <- makeLearner("cluster.MiniBatchKmeans")
getLearnerParVals(lrn)
# The below commented lines are printed
# named list()

关于我为什么看到这种行为的任何建议?我检查了其他学习者(例如cluster.kmeans,cluster.kkmeans等)代码,我看到他们能够成功地以与我所做的相同格式定义默认值。此外,这是这是正确的方法。

这是我在github上的代码,以防它有助于复制问题。还有一个添加的测试文件(在测试/testthat中),但它具有自己的问题。

编辑1-实际错误消息这是我在尝试训练学习者而不明确提供"群集"的默认值的实际错误消息:

lrn <- makeLearner("cluster.MiniBatchKmeans")
train(lrn, cluster_task)
 Error in ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...) : 
  argument "clusters" is missing, with no default 
10.
ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...) at RLearner_cluster_MiniBatchKmeans.R#32
9.
trainLearner.cluster.MiniBatchKmeans(.learner = structure(list(
    id = "cluster.MiniBatchKmeans", type = "cluster", package = "ClusterR", 
    properties = c("numerics", "prob"), par.set = structure(list(
        pars = list(clusters = structure(list(id = "clusters",  ... at trainLearner.R#24
8.
(function (.learner, .task, .subset, .weights = NULL, ...) 
{
    UseMethod("trainLearner")
})(.learner = structure(list(id = "cluster.MiniBatchKmeans",  ... 
7.
do.call(trainLearner, pars) at train.R#96
6.
fun3(do.call(trainLearner, pars)) at train.R#96
5.
fun2(fun3(do.call(trainLearner, pars))) at train.R#96
4.
fun1({
    learner.model = fun2(fun3(do.call(trainLearner, pars)))
}) at train.R#96
3.
force(expr) at helpers.R#93
2.
measureTime(fun1({
    learner.model = fun2(fun3(do.call(trainLearner, pars)))
})) at train.R#96
1.
train(lrn, cluster_task) 

存储库中的代码对我有用 - 运行时实际上会遇到错误吗?您对默认值进行编码的方式实际上更像是替代,而不是默认值。您可能想做

makeIntegerLearnerParam(id = "clusters", lower = 1L, default = 2L),

并删除par.vals

相关内容

  • 没有找到相关文章

最新更新