r语言 - 仅提供第一行数据结果的范围和函数



请原谅我这是我第一次在网上提问。

首先:设置一些数据以便于提问。

location <- c(1, 2, 3, 4)
numerator_estimate <- c(625, 180, 210, 1753)
numerator_variance <- c(22165, 2451, 11610, 172968)
denominator_estimate <- c(2278 , 4742, 1115, 26892)
denominator_variance <- c(15870, 688, 7172, 1908288)
my_df <-data.frame(location, numerator_estimate, numerator_variance, denominator_estimate, denominator_variance) 

此函数在给定分子和分母的估计和方差的情况下引导商的 SE

calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, iterations = 10000){
  numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
  denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
  quotient_sim <- numerator_sim/denominator_sim
  quotient_sim_se <- sd(quotient_sim)
  return(quotient_sim_se)
}

此函数计算商,并包含该函数以显示calculate_quotient_se不起作用,但另一个函数确实有效。

calculate_quotient <- function(numerator_estimate_f,denominator_estimate_f){
  quotient <- numerator_estimate_f/denominator_estimate_f
}
my_df2 <- my_df %>%
  mutate(quotient_se = calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000),
         quotient = calculate_quotient(numerator_estimate, denominator_estimate))
my_df2 

请注意,quotient_se仅适用于第一行,并且每增加一行,就会复制该 se。

它也不是这样工作的:

my_df$q_se <- calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000)
my_df

如果我像这样输入所有内容,它将起作用:

(x1 <- calculate_quotient_se(625, 22165, 2278, 15870))
(x2 <- calculate_quotient_se(180, 2451, 4742, 688))
(x3 <- calculate_quotient_se(210, 11610, 1115, 7172))
(x4 <- calculate_quotient_se(1753, 172968, 26892, 1908288))

关于如何在数据帧中获取模拟的 SE 以进行更多计算的任何建议?

my_df$quotient_se <- 
    apply(my_df, 1, function(x) calculate_quotient_se(x[2], x[3], x[4], x[5]))
my_df$quotient <- 
    apply(my_df, 1, function(x) calculate_quotient(x[2],x[4]))

如果您有一个未矢量化的函数,则可以使用 purrr::pmap 将其应用于数据集的行,这会在列表的元素(在本例中为数据框(中迭代p arallel 中的函数。由于您希望将其简化为数值向量,因此请使用pmap_dbl版本:

library(tidyverse)
set.seed(47)    # make sampling reproducible
my_df <- data_frame(location = c(1, 2, 3, 4),
                    numerator_estimate = c(625, 180, 210, 1753),
                    numerator_variance = c(22165, 2451, 11610, 172968),
                    denominator_estimate = c(2278 , 4742, 1115, 26892),
                    denominator_variance = c(15870, 688, 7172, 1908288))
calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, 
                                  iterations = 10000){
    numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
    denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
    quotient_sim <- numerator_sim/denominator_sim
    quotient_sim_se <- sd(quotient_sim)
    return(quotient_sim_se)
}
my_df <- my_df %>% mutate(quotient_se = pmap_dbl(.[-1], calculate_quotient_se))
my_df %>% select(location, quotient_se)
#> # A tibble: 4 x 2
#>   location quotient_se
#>      <dbl>       <dbl>
#> 1       1.      0.0684
#> 2       2.      0.0104
#> 3       3.      0.0993
#> 4       4.      0.0160

在这种情况下,.表示管道传入的数据,[-1]是删除location,不应将其传递到函数中。

另一种选择是重新排列函数,以便它可以接受矢量输入。在这种情况下,这可能意味着在内部使用矩阵。在规模上,这种方法几乎总是更快,尽管它可能会暂时使用更多内存来存储中间对象。

最新更新