我想在全局环境中创建一个函数列表,并根据需要在调用中调用它们以进行变异或总结,这样可以使dplyr代码不那么冗长。问题是函数必须使用在数据帧中定义的变量,而不是全局env。这可能都与物体抓取有关,这对我来说有点棘手
对于下面的所有代码,请加载所需的库:
library(dplyr)
library(purrr)
library(rlang)
例如:对于mtcars
数据集,我希望group_by
是一个变量,summarise
具有以下三个函数:any_vs_four_gears
any_am_high_hp
all_combined
。
我可以在通话中定义它们,总结如下,效果很好:
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = any(vs == 1 & gear == 4),
any_am_high_hp = any(am == 1 & hp >170),
all_combined = all(any_vs_four_gears, any_am_high_hp))
# # A tibble: 6 × 4
carb any_vs_four_gears any_am_high_hp all_combined
<dbl> <lgl> <lgl> <lgl>
1 1 TRUE FALSE FALSE
2 2 TRUE FALSE FALSE
3 3 FALSE FALSE FALSE
4 4 TRUE TRUE TRUE
5 6 FALSE TRUE FALSE
6 8 FALSE TRUE FALSE
我还可以将函数定义为表达式,然后评估调用中的表达式以进行总结,如下所示:
expressions_as_strings <- list(any_vs_four_gears = 'any(vs == 1 & gear == 4)',
any_am_high_hp = 'any(am == 1 & hp >170)',
all_combined = 'all(any_vs_four_gears, any_am_high_hp)')
expressions <- map(expressions_as_strings, parse_expr)
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = !!expressions$any_vs_four_gears,
any_am_high_hp = !!expressions$any_am_high_hp,
all_combined = !!expressions$all_combined)
然而,如果我可以定义函数而不是表达式,我觉得我可以获得更多的灵活性。
我尝试了几种方法都没有成功:
方法_1
method_1 <- list(any_vs_four_gears = function() any(vs == 1 & gear == 4),
any_am_high_hp = function() any(am == 1 & hp >170),
all_combined = function() all(any_vs_four_gears, any_am_high_hp))
#example
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_1$any_vs_four_gears())
method1失败。我认为这是因为函数从全局env而不是数据中获取vs和gear的值。
方法2
method_2 <- list(any_vs_four_gears = function(var1, var2) {any({{var1}} == 1 & {{var2}} == 4)},
any_am_high_hp = function(var1, var2) {any({{var1}} == 1 & {{var2}} > 170)},
all_combined = function(var1, var2) {all({{var1}}, {{var2}})})
# example
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_2$any_vs_four_gears(vs, gear))
方法2确实有效,但我必须将变量作为参数包含到函数中,我希望能够绕过它。
主要问题
有没有一种方法可以创建一个使用数据帧中的变量,但不需要将变量名作为参数的函数?我想要的是类似于method_1的东西,带有伪代码:
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_x$any_vs_four_gears(),
any_am_high_hp = method_x$any_am_high_hp(),
all_combined = method_x$all_combined())
在前面,我通常反对编写破坏功能再现性的函数,因为我花了太多时间对基于未传递给它们的东西而改变行为的函数进行故障排除。
但是,请尝试以下操作:
method_1 <- list(
any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)
mtcars %>%
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# # A tibble: 6 x 4
# carb any_vs_four_gears any_am_high_hp all_combined
# <dbl> <lgl> <lgl> <lgl>
# 1 1 TRUE FALSE FALSE
# 2 2 TRUE FALSE FALSE
# 3 3 FALSE FALSE FALSE
# 4 4 TRUE TRUE TRUE
# 5 6 FALSE TRUE FALSE
# 6 8 FALSE TRUE FALSE
这使用了dplyr
-管道环境中的cur_data()
代词/函数,只添加了一点周围代码(with(data, { ... })
,因此{
-表达式友好(;照原样";。
错误不难解释:
mtcars %>%
select(-vs) %>% # intentionally setting up an error
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.