r-试图理解eval(expr,envir=df)是如何工作的



我已经构建了一个似乎可以工作的函数,但我不明白为什么。

我最初的问题是获取一个包含种群计数的data.frame,并将其扩展以重新创建原始种群。如果您事先知道列名,这就很容易了。

library(tidyverse)
set.seed(121)
test_counts <- tibble(Population = letters[1:4], Length = c(1,1,2,1), 
Number = sample(1:100, 4))
expand_counts_v0 <- function(Length, Population, Number) { 
tibble(Population = Population, 
Length = rep(Length, times = Number))
}

test_counts %>% pmap_dfr(expand_counts_v0) %>%   # apply it
group_by(Population, Length) %>%    # test it
summarise(Number = n()) %>%  
ungroup %>%
{ all.equal(., test_counts)}
# [1] TRUE    

然而,我想将其概括为一个不需要在data.frame的列名处知道的函数,并且我对NSE感兴趣,所以我写了:

test_counts1 <- tibble(Population = letters[1:4], 
Length = c(1,1,2,1), 
Number = sample(1:100, 4),
Height = c(100, 50, 45, 90),
Width = c(700, 50, 60, 90)
)

expand_counts_v1 <- function(df, count = NULL) { 
countq <- enexpr(count)
names <- df %>% select(-!!countq) %>% names 
namesq <- names %>% map(as.name)
cols <- map(namesq, ~ expr(rep(!!., times = !!countq))
) %>% set_names(namesq)
make_tbl <- function(...) {
expr(tibble(!!!cols)) %>% eval(envir = df)
}
df %>% pmap_dfr(make_tbl)
}

但是,当我测试这个函数时,它似乎重复了4次行:

test_counts %>% expand_counts_v1(count = Number) %>% 
group_by(Population, Length) %>%
summarise(Number = n()) %>%
ungroup %>%
{ sum(.$Number)/sum(test_counts$Number)}
# [1] 4

这让我猜测了一个解决方案,那就是

expand_counts_v2 <- function(df, count = NULL) { 
countq <- enexpr(count)
names <- df %>% select(-!!countq) %>% names 
namesq <- names %>% map(as.name)
cols <- map(namesq, ~ expr(rep(!!., times = !!countq))
) %>% set_names(namesq)
make_tbl <- function(...) {
expr(tibble(!!!cols)) %>% eval(envir = df)
}
df %>% make_tbl
}

这似乎奏效了:

test_counts %>% expand_counts_v2(count = Number) %>% 
group_by(Population, Length) %>%
summarise(Number = n()) %>%
ungroup %>%
{ all.equal(., test_counts)}
# [1] TRUE 
test_counts1 %>% expand_counts_v2(count = Number) %>% 
group_by(Population, Length, Height, Width) %>%
summarise(Number = n()) %>%
ungroup %>%
{ all.equal(., test_counts1)}
# [1] TRUE

但我不明白为什么。即使我不再使用pmap,它是如何评估每一行的?这个函数需要应用到每一行才能工作,所以它一定是以某种方式工作的,但我不知道它是如何做到的。

编辑

在Artem对发生的事情做出正确解释后,我意识到我可以做这个

expand_counts_v2 <- function(df, count = NULL) { 
countq <- enexpr(count)
names <- df %>% select(-!!countq) %>% names 
namesq <- names %>% map(as.name)
cols <- map(namesq, ~ expr(rep(!!., times = !!countq))
) %>% set_names(namesq)
expr(tibble(!!!cols)) %>% eval_tidy(data = df)
}

它去掉了不必要的mk_tbl函数。然而,正如Artem所说,这只是真正有效的,因为rep是矢量化的。所以,它是有效的,但不是通过重写_v0函数并对其进行pmapping,这是我试图复制的过程。最终,我发现了rlang::new_function,并写道:

expand_counts_v3 <- function(df, count = NULL) { 
countq <- enexpr(count)
names <- df %>% select(-!!countq) %>% names 
namesq <- names %>% map(as.name)
cols <- map(namesq, ~ expr(rep(!!., times = !!countq))
) %>% set_names(namesq)
all_names <- df %>% names %>% map(as.name) 
args <- rep(0, times = length(all_names)) %>% as.list %>% set_names(all_names)
correct_function <- new_function(args,     # this makes the function as in _v0
expr(tibble(!!!cols))  )
pmap_dfr(df, correct_function)     # applies it as in _v0
}

它更长,可能更丑,但按照我最初想要的方式工作。

问题出现在eval( envir = df )中,它将整个数据帧暴露给make_tbl()。请注意,您从未在make_tbl()中使用...参数。相反,该函数有效地计算了的等效值

with( df, tibble(Population = rep(Population, times = Number), 
Length = rep(Length, times=Number)) )

不管你给它提供了什么参数。当你通过pmap_dfr()调用函数时,它基本上会计算上述四次(每行一次(,并逐行连接结果,导致你观察到的条目重复。当您移除pmap_dfr()时,函数会被调用一次,但由于rep本身是矢量化的(尝试执行rep( test_counts$Population, test_counts$Number )来了解我的意思(,make_tbl()会一次性计算整个结果。

相关内容

  • 没有找到相关文章