r语言 - 如何避免内联表达式时,结合地图与tiveval建模包装



我正在尝试组合灵活的建模功能(使用tidyval),然后在嵌套的数据框架中映射数据(并试图在此过程中学习整洁的评估)。我遇到了用捕获的调用内联表达式的问题(我认为)。任何建议,例子,提示,或编写包装器,以简化重复的建模任务,然后使用purrr::map等最佳实践?

下面的例子是基于使用mtcars数据的20 Evaluation | Advanced R中的部分包装建模函数。

library(rlang)
library(tidyverse)
lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {

traits <- enexpr(traits)
resp <- enexpr(resp)
data <- enexpr(data)
dots <- enexprs(...)
lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots),  env)

return(lm_call)
}

包装器函数适用于单个情况

lm_wrap(traits = hp, data = mtcars, resp = mpg)
#Call:
#lm(formula = mpg ~ hp, data = mtcars)
#Coefficients:
#(Intercept)           hp  
# 30.09886     -0.06823

但是看起来它遇到了内联表达式的问题,至少根据这个有点相关的例子20评估|高级R

mt_nested <- mtcars %>% group_by(cyl) %>% nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#lm(formula = mpg ~ hp, data = list(mpg = c(21, 21, 21.4, 18.1, 
#19.2, 17.8, 19.7), disp = c(160, 160, 258, 225, 167.6, 167.6, 
#145), hp = c(110, 110, 110, 105, 123, 123, 175), drat = c(3.9, 
#3.9, 3.08, 2.76, 3.92, 3.92, 3.62), wt = c(2.62, 2.875, 3.215, 
#3.46, 3.44, 3.44, 2.77), qsec = c(16.46, 17.02, 19.44, 20.22, 
#18.3, 18.9, 15.5), vs = c(0, 0, 1, 1, 1, 1, 0), am = c(1, 1, 
#0, 0, 0, 0, 1), gear = c(4, 4, 3, 3, 4, 4, 5), carb = c(4, 4, 
#1, 1, 4, 4, 6)))

提前感谢,

m .

问题是您正在尝试混合不同的环境。调用者的,其中公式中的数据可能被定义,以及您的函数的,其中data已传递给。

一种解决方案是在env中单独创建公式并注入表达式,然后在本地环境中调用lm()。还要注意,enexprs(...)将以各种不明显的方式被破坏。相反,我只是将圆点传递给lm()

lm_wrap <- function(data, traits, resp, ..., env = caller_env()) {
traits <- enexpr(traits)
resp <- enexpr(resp)
# First create the formula in the right environment.
# Formulas keep track of the env they've been created in.
f <- inject(!!resp ~ !!traits,  env)

# Now inject the formula inside a local call
inject(lm(!!f, data = data, ...))
}

第二轮注入确保在调用中记录公式本身,而不是符号f

您可以使用data引号来构建调用:

library(rlang)
library(tidyverse)
lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {

traits <- enexpr(traits)
resp <- enexpr(resp)
dots <- enexprs(...)

formula <- inject(formula(!!resp ~ !!traits,  env = env), env)

do.call("lm", c(formula = formula, data = quote(data), inject(!!dots, env)))
}

mt_nested <- mtcars %>% 
group_by(cyl) %>% 
nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = data)

如果您希望调用包含data的替代值,而不是总是说data,您可以执行substitute(data)并在父框架中计算调用。您可以在以R为基数的情况下这样做:

lm_wrap <- function(data, traits, resp, ...) {

f <- paste(deparse(substitute(resp)), deparse(substitute(traits)), sep = "~")
f <- as.formula(f)
do.call("lm", c(f, substitute(data), ...), envir = parent.frame())
}

对此进行测试,我们得到call对象中data的值为.x[[i]],这就是map主体内部引用数据块的方式:

mt_nested <- mtcars %>% 
group_by(cyl) %>% 
nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = .x[[i]])

如果我们直接调用该函数,我们会在调用

中得到预期的mtcars
lm_wrap(mtcars, mpg, hp)$call
#> lm(formula = hp ~ mpg, data = mtcars)

创建于2022-11-09与reprex v2.0.2

我猜你想要这样的东西:

library(rlang)
library(tidyverse)

lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
traits <- enexpr(traits)
resp <- enexpr(resp)
data <- enexpr(data)
dots <- enexprs(...)
lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots),  env)
return(lm_call)
}
mt_nested <- mtcars %>% group_by(cyl) %>% 
group_modify( ~ tibble(
data = list(.x), 
model = list(lm_wrap(mtcars %>% filter(cyl==!!.y$cyl), resp = mpg, traits = hp))))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = mtcars %>% filter(cyl == 4))

最新更新