在r中的dplyr管道中迭代汇总



考虑R中以下简单的dplyr管道:

df <- data.frame(group = rep(LETTERS[1:3],each=5), value = rnorm(15)) %>% 
group_by(group) %>% 
mutate(rank = rank(value, ties.method = 'min'))
df %>%
group_by(group) %>% 
summarise(mean_1 = mean(value[rank <= 1]),
mean_2 = mean(value[rank <= 2]),
mean_3 = mean(value[rank <= 3]),
mean_4 = mean(value[rank <= 4]),
mean_5 = mean(value[rank <= 5]))

我怎样才能避免为所有i输入mean_i = mean(value[rank <= i])而不恢复到groupi的循环?具体来说,是否有一个整洁的方法来迭代地创建变量与dplyr::summarise函数?

你其实是在计算累积平均值。在dplyr中有一个cummean函数,我们可以在这里使用它来将数据转换为宽格式。

library(tidyverse)
df %>%
arrange(group, rank) %>%
group_by(group) %>%
mutate(value = cummean(value)) %>%
pivot_wider(names_from = rank, values_from = value, names_prefix = 'mean_')
#  group mean_1 mean_2  mean_3  mean_4  mean_5
#  <chr>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
#1 A     -0.560 -0.395 -0.240  -0.148   0.194 
#2 B     -1.27  -0.976 -0.799  -0.484  -0.0443
#3 C     -0.556 -0.223 -0.0284  0.0789  0.308 

如果你想要一个通用的解决方案,计算累积平均值只是一个例子,在这种情况下,你可以使用map

n <- max(df$rank)
map(seq_len(n), ~df %>%
group_by(group) %>%
summarise(!!paste0('mean_', .x):= mean(value[rank <= .x]))) %>%
reduce(inner_join, by = 'group')

set.seed(123)
df <- data.frame(group = rep(LETTERS[1:3],each=5), value = rnorm(15)) %>% 
group_by(group) %>% 
mutate(rank = rank(value, ties.method = 'min'))

相关内容

  • 没有找到相关文章

最新更新