是否有一种使用R近似函数的方法,其中插值的值依赖于数据框架中的其他列?



我想使用approx函数在下面的例子中为我的表添加一个名为'data'的pct_answer列。pct_answer列是通过从名为'lookup'的表中线性插值计算出来的。

要插入的值取决于data表中'account'和'pattern_number'的值。

我已经设法通过创建一个函数,然后使用purrr::map在数据框架的所有行上应用该函数来产生我需要的答案(请参阅下面的可重复示例)。然而,在实际问题中,我试图解决的数据中有数百万行,这需要很长时间才能运行。

是否有更快的方法达到相同的结果?

library(tidyverse)
data <- tibble(
account = c(rep("a", 5), rep("b", 7), rep("c", 7)),
pattern_num = c(rep(1, 2), rep(2, 3), rep(1, 11), rep(2, 3)),
pct_lookup = c(seq(0.05, 0.25, by = 0.05), seq(0.5, 0.8, by = 0.05), seq(0.65, 0.95, by = 0.05))
)
lookup <- tibble(
account = c(rep("a", 20), rep("b", 10)),
pattern_num = c(rep(1, 10), rep(2, 10), rep(1, 10)),
pct_lookup = rep(seq(0.1, 1, by = 0.1), 3),
norm_mean = c(rep(0, 10), rep(0.5, 10), rep(0.25, 10))
) %>% 
mutate(pct_answer = map2_dbl(pct_lookup, norm_mean, pnorm)) %>% 
dplyr::select(-norm_mean)
lookup_function <- function(account_name, pattern_number, lookup_num){

lookup_table <- lookup %>% 
filter(account == {{account_name}}, pattern_num == {{pattern_number}})

if(nrow(lookup_table) == 0){
return(NA)
}else{

result <- approx(lookup_table$pct_lookup, lookup_table$pct_answer, xout = lookup_num, ties = "ordered", rule = 2)$y

return(result)
}
}
data <- data %>% 
rowwise() %>% 
mutate(pct_answer = map2(account, pattern_num, lookup_function, lookup_num = pct_lookup))

您可以尝试以下data.table方法:

library(data.table)
setDT(data)
data[, pct_answer := mapply(lookup_function, account, pattern_num, pct_lookup)]
data

使用Map

library(data.table)
setDT(data)[, pct_answer := unlist(Map(lookup_function, account, pattern_num, pct_lookup)])

最新更新