r-dplyr突变,然后总结,失去突变场



我可以变异到一个分组字段上,得到一个分组的最小-最大值,如下所示:

library(tidyverse)
diamonds %>% group_by(cut, color) %>% mutate(best_price = max(price))
# A tibble: 53,940 x 11
# Groups:   cut, color [35]
carat cut       color clarity depth table price     x     y     z best_price
<dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>      <int>
1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43      18729
2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31      18477
3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31      18236
4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63      18823
5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75      18325
6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48      18430
7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47      18500
8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53      18803
9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49      15584
10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39      18803

假设我想继续我的分组字段并总结:

diamonds %>% group_by(cut, color) %>% mutate(best_price = max(price)) %>% summarise(blah = sum(price))
`summarise()` regrouping output by 'cut' (override with `.groups` argument)
# A tibble: 35 x 3
# Groups:   cut [5]
cut   color    blah
<ord> <ord>   <int>
1 Fair  D      699443
2 Fair  E      824838
3 Fair  F     1194025
4 Fair  G     1331126
5 Fair  H     1556112
6 Fair  I      819953
7 Fair  J      592103
8 Good  D     2254363
9 Good  E     3194260
10 Good  F     3177637
# … with 25 more rows

我希望/期望看到best_price包含在这里,但它没有通过总结。如何调整我的链以包括我之前为每组创建的字段best_price?

当我们计算作为单个值的max时,可以将.add放入分组

library(dplyr)
diamonds %>% 
group_by(cut, color) %>%
group_by(best_price = max(price), .add = TRUE) %>%
summarise(blah = sum(price), .groups = 'drop')

-输出

# A tibble: 35 x 4
#   cut   color best_price    blah
# * <ord> <ord>      <int>   <int>
# 1 Fair  D          18823  699443
# 2 Fair  E          18823  824838
# 3 Fair  F          18823 1194025
# 4 Fair  G          18823 1331126
# 5 Fair  H          18823 1556112
# 6 Fair  I          18823  819953
# 7 Fair  J          18823  592103
# 8 Good  D          18823 2254363
# 9 Good  E          18823 3194260
#10 Good  F          18823 3177637
# … with 25 more rows

或者另一个选项是在两列上使用summarise

diamonds %>%
group_by(cut, color) %>%
summarise(best_price = max(price), blah = sum(price), .groups = 'drop')

最新更新