请原谅我这是我第一次在网上提问。
首先:设置一些数据以便于提问。
location <- c(1, 2, 3, 4)
numerator_estimate <- c(625, 180, 210, 1753)
numerator_variance <- c(22165, 2451, 11610, 172968)
denominator_estimate <- c(2278 , 4742, 1115, 26892)
denominator_variance <- c(15870, 688, 7172, 1908288)
my_df <-data.frame(location, numerator_estimate, numerator_variance, denominator_estimate, denominator_variance)
此函数在给定分子和分母的估计和方差的情况下引导商的 SE
calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, iterations = 10000){
numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
quotient_sim <- numerator_sim/denominator_sim
quotient_sim_se <- sd(quotient_sim)
return(quotient_sim_se)
}
此函数计算商,并包含该函数以显示calculate_quotient_se不起作用,但另一个函数确实有效。
calculate_quotient <- function(numerator_estimate_f,denominator_estimate_f){
quotient <- numerator_estimate_f/denominator_estimate_f
}
my_df2 <- my_df %>%
mutate(quotient_se = calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000),
quotient = calculate_quotient(numerator_estimate, denominator_estimate))
my_df2
请注意,quotient_se仅适用于第一行,并且每增加一行,就会复制该 se。
它也不是这样工作的:
my_df$q_se <- calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000)
my_df
如果我像这样输入所有内容,它将起作用:
(x1 <- calculate_quotient_se(625, 22165, 2278, 15870))
(x2 <- calculate_quotient_se(180, 2451, 4742, 688))
(x3 <- calculate_quotient_se(210, 11610, 1115, 7172))
(x4 <- calculate_quotient_se(1753, 172968, 26892, 1908288))
关于如何在数据帧中获取模拟的 SE 以进行更多计算的任何建议?
my_df$quotient_se <-
apply(my_df, 1, function(x) calculate_quotient_se(x[2], x[3], x[4], x[5]))
my_df$quotient <-
apply(my_df, 1, function(x) calculate_quotient(x[2],x[4]))
如果您有一个未矢量化的函数,则可以使用 purrr::pmap
将其应用于数据集的行,这会在列表的元素(在本例中为数据框(中迭代p
arallel 中的函数。由于您希望将其简化为数值向量,因此请使用pmap_dbl
版本:
library(tidyverse)
set.seed(47) # make sampling reproducible
my_df <- data_frame(location = c(1, 2, 3, 4),
numerator_estimate = c(625, 180, 210, 1753),
numerator_variance = c(22165, 2451, 11610, 172968),
denominator_estimate = c(2278 , 4742, 1115, 26892),
denominator_variance = c(15870, 688, 7172, 1908288))
calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f,
iterations = 10000){
numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
quotient_sim <- numerator_sim/denominator_sim
quotient_sim_se <- sd(quotient_sim)
return(quotient_sim_se)
}
my_df <- my_df %>% mutate(quotient_se = pmap_dbl(.[-1], calculate_quotient_se))
my_df %>% select(location, quotient_se)
#> # A tibble: 4 x 2
#> location quotient_se
#> <dbl> <dbl>
#> 1 1. 0.0684
#> 2 2. 0.0104
#> 3 3. 0.0993
#> 4 4. 0.0160
在这种情况下,.
表示管道传入的数据,[-1]
是删除location
,不应将其传递到函数中。
另一种选择是重新排列函数,以便它可以接受矢量输入。在这种情况下,这可能意味着在内部使用矩阵。在规模上,这种方法几乎总是更快,尽管它可能会暂时使用更多内存来存储中间对象。