我想计算2005年、2006年、2007年和a、b、c类数据帧的总和。
year <- c(2005,2005,2005,2006,2006,2006,2007,2007,2007)
category <- c("a","a","a","b","b","b","c","c","c")
value <- c(3,6,8,9,7,4,5,8,9)
df <- data.frame(year, category,value, stringsAsFactors = FALSE)
表格应该是这样的:
year | category | value |
---|---|---|
2005 | a | 1 |
2005 | a | 1 |
2005 | a | 1 |
2006 | b | 2 |
2006 | b | 2 |
2006 | b | 2 |
2007 | c | 3 |
2007 | c | 3 |
2007 | c | 3 |
2006 | a | 3 |
2007 | b | >td style="text-align:right;">6|
2008 | c | >td style="text-align:right;">9
使用dplyr
包:
df %>%
group_by(year, category) %>%
summarise(sum = sum(value))
# # A tibble: 3 × 3
# # Groups: year [3]
# year category sum
# <dbl> <chr> <dbl>
# 1 2005 a 17
# 2 2006 b 20
# 3 2007 c 22
如果您希望添加一列作为总和而不是折叠它,请将summarise()
替换为mutate()
df %>%
group_by(year, category) %>%
mutate(sum = sum(value))
# # A tibble: 9 × 4
# # Groups: year, category [3]
# year category value sum
# <dbl> <chr> <dbl> <dbl>
# 1 2005 a 3 17
# 2 2005 a 6 17
# 3 2005 a 8 17
# 4 2006 b 9 20
# 5 2006 b 7 20
# 6 2006 b 4 20
# 7 2007 c 5 22
# 8 2007 c 8 22
# 9 2007 c 9 22
使用aggregate
的基本R解决方案
rbind( df, aggregate( value ~ year + category, df, sum ) )
year category value
1 2005 a 3
2 2005 a 6
3 2005 a 8
4 2006 b 9
5 2006 b 7
6 2006 b 4
7 2007 c 5
8 2007 c 8
9 2007 c 9
10 2005 a 17
11 2006 b 20
12 2007 c 22