在r中按组执行自定义汇总函数



这是我第一次在这里发布问题,所以请原谅我,如果你有让我的问题更清楚的技巧,请告诉我。

我试图启动一个函数,该函数将按组("c", "e")总结给定的列,我已经初始化如下所示,但是当我将参数传递给函数(df, x)时,输出似乎忽略了分组因素。如何确保在应用自定义摘要功能时遵守分组?

#initialize and relevel factor
dexadf$group <- factor(dexadf$group, levels=c("c", "e"),
labels = c("c", "e"))
dexadf$group <- relevel(dexadf$group, ref="c")
attributes(dexadf$group)

我的数据看起来是这样的,为了简单起见,我只包含了感兴趣的1列(fm_bdc3):

> dput(dexadf)
structure(list(participant = c("pt04", "pt75", "pt21", "pt73", 
"pt27", "pt39", "pt43", "pt52", "pt69", "pt49", "pt50", "pt56", 
"pt62", "pt68", "pt22", "pt64", "pt54", "pt79", "pt36", "pt26", 
"pt65", "pt38"), group = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("c", "e"), class = "factor"),  
fm_bdc3 = c(18.535199635968, 23.52996574649, 17.276246451976, 
11.526088555461, 23.805048656112, 23.08597823716, 28.691020942436, 
28.968097858499, 23.378093165331, 22.491725344661, 14.609015054932, 
19.734914019306, 31.947412973684, 25.152298171274, 12.007356801787, 
20.836128108938, 22.322230884349, 14.777652101515, 21.389572717608, 
16.992853675086, 14.138189878472, 17.777235203826)

→功能:

summbygrp <- function(df, x) {
group_by(df, group) %>%
summarise(
count = n(),
mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
}

→函数输出:

> summbygrp(dexadf, fm_bdc3) 
# A tibble: 2 × 7
group count  mean    sd    se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  20.6  5.48  1.65     16.9     24.3
2 e        11  20.6  5.48  1.65     16.9     24.3

正如你所看到的,两组的总结是相同的,我知道这不是真的。有人能找出我代码中的错误吗?

下面是如果我不使用函数的输出,但是我有很多列,所以为每个列创建这个将是非常繁琐的

group_by(dexadf, group) %>%
summarise(
count = n(),
mean = mean(fm_bdc3, na.rm = TRUE),
sd = sd(fm_bdc3, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)

→正确输出:

# A tibble: 2 × 7
group count  mean    sd    se lower.ci upper.ci
<fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 c        11  19.3  5.49  1.66     15.6     23.0
2 e        11  21.9  5.40  1.63     18.2     25.5
library(dplyr)
library(rlang)

dexadf <- data.frame(
stringsAsFactors = FALSE,
participant = c("pt04","pt75","pt21","pt73",
"pt27","pt39","pt43","pt52","pt69","pt49","pt50",
"pt56","pt62","pt68","pt22","pt64","pt54","pt79",
"pt36","pt26","pt65","pt38"),
fm_bdc3 = c(18.535199635968,23.52996574649,
17.276246451976,11.526088555461,23.805048656112,
23.08597823716,28.691020942436,28.968097858499,
23.378093165331,22.491725344661,14.609015054932,19.734914019306,
31.947412973684,25.152298171274,12.007356801787,
20.836128108938,22.322230884349,14.777652101515,
21.389572717608,16.992853675086,14.138189878472,17.777235203826),
group = as.factor(c("c","e",
"e","c","c","e","c","e","c","e","e","c",
"e","c","c","e","e","c","e","c","e",
"c")),
sex = as.factor(c("f","m",
"m","m","m","m","m","f","m","f","f","f",
"f","f","f","f","m","f","m","m","f",
"m"))
)

summbygrp <- function(df, x) {
group_by(df, group) %>%
summarise(
count = n(),
mean = mean({{x}}, na.rm = TRUE),
sd = sd({{x}}, na.rm = TRUE)
) %>%
mutate(se = sd / sqrt(11),
lower.ci = mean - qt(1 - (0.05 / 2), 11 - 1) * se,
upper.ci = mean + qt(1 - (0.05 / 2), 11 - 1) * se
)
}
summbygrp(dexadf, fm_bdc3)
#> # A tibble: 2 × 7
#>   group count  mean    sd    se lower.ci upper.ci
#>   <fct> <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 c        11  19.3  5.49  1.66     15.6     23.0
#> 2 e        11  21.9  5.40  1.63     18.2     25.5

在2022-07-09由reprex包(v2.0.1)创建

你实际上需要使用{{}},明显curly-curly,从rlang包这个函数的工作。当你想传递变量(即数据集的列)作为使用dplyr或其他tidyverse动词(如mutate, summarise, group_by等)的函数内的函数参数时,你需要像这里的x一样将这些参数包装起来。否则该函数将无法按预期工作,很可能会抛出错误。因为tidyverse动词使用了NSE(非标准评价)。要了解更多,请查看"使用dplyr编程",我还建议您阅读Advanced R

一书的17-20章。

最新更新