除非存在可选参数，否则以平均值计算导数

假设我有一个函数y=f(x1, x2, v1)，其中x1和x2是连续变量，v1是0或1。例如：

myfunc <- function(x1, x2, v1){
  e <- exp(0.1*x1+0.2*x2+v1)
  return(e)
}

我想创建一个函数h，它将数据帧、函数作为参数，并将函数变量的值作为可选参数。这个函数h应该返回一个带有dy/dx的向量。如果不存在可选参数，则以参数的平均值计算dy/dx。但是，可选地，我希望能够传递一个可以使用的值，而不是平均值。例如，v1=0。这可能吗？

这就是我现在拥有的：

df <- data.frame(x1=rnorm(100, 2, 1), x2= rnorm(100, 4, 1),
                 v1=sample(x = c(0,1), size = 100, replace = T))
numDeriv::grad(myfunc, x=mean(df$x1), x2=mean(df$x2), v1=0)

唉，我想不出一种方法来编写带有可选参数的函数。

进入时：

CCD_ 12是用户输入功能
dx是一个长度为1的字符向量，赋予变量名称（偏）导数是关于（w.r.t）的
dat是一个数据帧，给出了我们想要评估导数的所有值
opt默认为NULL；但如果给定，则必须是一个列表

在退出时，它返回一个向量，给出关于"x1"的偏导数，在df[[x1]]处评估，其他变量固定，要么在df的列平均值处，要么在opt中提供值。

示例调用，使用OP的原始示例，是：

h(myfunc, "x1", df)
h(myfunc, "x1", df, list(v1 = 1))
h(myfunc, "x1", df, list(x2 = 2, v1 = 0))
h(myfunc, "x2", df, list(x1 = 1.2))

此功能是：

h <- function(FUN, dx, dat, opt = NULL) {
  ## check arguments of h
  if (missing(FUN)) stop("No given function: FUN")
  if (!is.function(FUN)) stop("arguments 'FUN' is not a function!")
  if (missing(dx)) stop("No given variable to evaluate partial derivatives: dx")
  if (!is.character(dx)) stop("arguments 'dx' must be a character!")
  if (length(dx) > 1) stop("arguments 'dx' must be of length 1")
  if (missing(dat)) stop("No data provided: dat")
  if (!is.data.frame(dat)) stop("arguments 'dat' must be a data frame!")
  OPT <- !is.null(opt); if (OPT && !is.list(opt)) stop("optional arguments must be provided as a list!")
  ## get arguments of FUN
  FUN_args <- formalArgs(FUN)
  ## get column vars of dat
  dat_vars <- colnames(dat)
  ## get names of optional arguments
  if (OPT) opt_vars <- names(opt)
  ## need to ensure dx is both inside dat and a function argument of FUN
  if (!dx %in% FUN_args) stop(paste("unknown function argumens of FUN:", dx))
  if (!dx %in% dat_vars) stop(paste("variable", dx, "is not in `dat`!"))
  ## now, let's take care of other arguments of FUN, if there are any
  n <- length(FUN_args <- FUN_args[-match(dx, FUN_args)])
  if (n > 0) {
    ## are there optional arguments?
    if (OPT) {
      ## extract optional arguments that are in FUN_args (refining opt)
      opt_vars <- FUN_args[FUN_args %in% opt_vars]
      opt <- opt[opt_vars]
      ## excluce opt_vars from dat_vars
      FUN_args <- FUN_args[-match(opt_vars, FUN_args)]
      }
    ## now, all remaining FUN_args must be found inside dat_vars
    missing_vars <- FUN_args[!FUN_args %in% dat_vars]
    if (length(missing_vars)) stop(paste("Those variables are not found from `dat` or `opt`:", missing_vars))
    ## now, there are no missing vars, so we compute column mean from dat for FUN_args
    tmp <- lapply(dat[FUN_args], FUN = mean)
    ## combine tmp and opt
    opt <- c(opt, tmp)
    }
  ## prepare function call!
  text <- "numDeriv::grad(FUN, x = dat[[dx]]"
  FUN_args <- names(opt)
  for (i in 1:length(opt)) {
    expr <- paste0("opt[['", FUN_args[i], "']]")
    expr <- paste(FUN_args[i], expr, sep = " = ")
    text <- paste(text, expr, sep = ", ")
    }
  expr <- paste0(text,")")
  ## evaluate partial derivatives
  eval(parse(text = expr))
  }

注意，此功能可以进行各种检查，但目前不会检查所有内容。例如，您可能需要进一步确保：

对于任何i in 1:length(opt)，opt[[i]]是长度为1的数字矢量
用CCD_ 23进行适当处理

构造对numDeriv::grad的最终函数调用的eval、parse方法可能不是最佳方法，但不幸的是，我目前想不出更好的方法。有时我在阅读一些R包的源代码时会看到match.call()、do.call()，但并不完全确定它们在做什么。也许它们是更好的替代品？但至少这个功能是有效的，老实说，经过4-5个小时的工作，总比什么都没有好。

更新：关于您的评论

我将用数据来解释h在做什么。您的示例数据帧类似于：

             x1       x2 v1
1    1.79741154 6.484015  0
2    1.59461279 3.655893  1
3    1.59738477 4.053226  1
4    3.41523605 4.079614  0
5    3.84462359 2.871799  1
.             .        .  .
.             .        .  .

列方式：

      x1       x2       v1 
1.921273 4.026466 0.500000

现在，

对于opt列表中没有可选参数的函数调用h(myfunc, "x1", df)，它返回以下点的偏导数：

             x1       x2        v1
1    1.79741154 4.026466  0.500000
2    1.59461279 4.026466  0.500000
3    1.59738477 4.026466  0.500000
4    3.41523605 4.026466  0.500000
5    3.84462359 4.026466  0.500000
.             .        .         .
.             .        .         .

它取df[["x1"]]的所有值，但复制列意味着mean(df[["x2"]])和mean(df[[v1]])。

对于函数调用h(myfunc, "x1", df, opt = list(v1 = 1))，它返回以下点的偏导数：

             x1       x2 v1
1    1.79741154 4.026466  1
2    1.59461279 4.026466  1
3    1.59738477 4.026466  1
4    3.41523605 4.026466  1
5    3.84462359 4.026466  1
.             .        .  .
.             .        .  .

它取df[["x1"]]的所有值，但复制列意味着mean(df[["x2"]])和提供的值v1 = 1。

对于函数调用h(myfunc, "x1", df, opt = list(x2 = 2, v1 = 1))，它返回以下点的偏导数：

             x1 x2 v1
1    1.79741154  2  1
2    1.59461279  2  1
3    1.59738477  2  1
4    3.41523605  2  1
5    3.84462359  2  1
.             .  .  .
.             .  .  .

其取df[["x1"]]的所有值但复制可选值x2 = 2和v1 = 1。

对于函数调用h(myfunc, "x2", df, list(x1 = 1.2))，它返回以下点的偏导数：

      x1       x2   v1
1    1.2 6.484015  0.5
2    1.2 3.655893  0.5
3    1.2 4.053226  0.5
4    1.2 4.079614  0.5
5    1.2 2.871799  0.5
.      .        .    .
.      .        .    .

它取df[["x2"]]的所有值，但复制列平均值mean(df[["v1"]]) = 0.5和可选值x1 = 1.2。

对于函数调用h(myfunc, "x1", df, opt = list(x1 = 1))将具有与h(myfunc, "x1", df)相同的效果，即h将忽略x1 = 1，因为您将"x1"传递给dx。

函数总是返回一个向量，因为我放入：

  ## prepare function call!
  text <- "numDeriv::grad(FUN, x = dat[[dx]]"

而不是

  ## prepare function call!
  text <- "numDeriv::grad(FUN, x = mean(dat[[dx]])"

我这样做是因为我不知道如果您只想返回一个值，为什么要传入数据帧df

相关内容

最新更新

热门标签：