我可以变异到一个分组字段上,得到一个分组的最小-最大值,如下所示:
library(tidyverse)
diamonds %>% group_by(cut, color) %>% mutate(best_price = max(price))
# A tibble: 53,940 x 11
# Groups: cut, color [35]
carat cut color clarity depth table price x y z best_price
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 18729
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 18477
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 18236
4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 18823
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 18325
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 18430
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 18500
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 18803
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 15584
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 18803
假设我想继续我的分组字段并总结:
diamonds %>% group_by(cut, color) %>% mutate(best_price = max(price)) %>% summarise(blah = sum(price))
`summarise()` regrouping output by 'cut' (override with `.groups` argument)
# A tibble: 35 x 3
# Groups: cut [5]
cut color blah
<ord> <ord> <int>
1 Fair D 699443
2 Fair E 824838
3 Fair F 1194025
4 Fair G 1331126
5 Fair H 1556112
6 Fair I 819953
7 Fair J 592103
8 Good D 2254363
9 Good E 3194260
10 Good F 3177637
# … with 25 more rows
我希望/期望看到best_price包含在这里,但它没有通过总结。如何调整我的链以包括我之前为每组创建的字段best_price?
当我们计算作为单个值的max
时,可以将.add
放入分组
library(dplyr)
diamonds %>%
group_by(cut, color) %>%
group_by(best_price = max(price), .add = TRUE) %>%
summarise(blah = sum(price), .groups = 'drop')
-输出
# A tibble: 35 x 4
# cut color best_price blah
# * <ord> <ord> <int> <int>
# 1 Fair D 18823 699443
# 2 Fair E 18823 824838
# 3 Fair F 18823 1194025
# 4 Fair G 18823 1331126
# 5 Fair H 18823 1556112
# 6 Fair I 18823 819953
# 7 Fair J 18823 592103
# 8 Good D 18823 2254363
# 9 Good E 18823 3194260
#10 Good F 18823 3177637
# … with 25 more rows
或者另一个选项是在两列上使用summarise
diamonds %>%
group_by(cut, color) %>%
summarise(best_price = max(price), blah = sum(price), .groups = 'drop')