这是一个表格:
origin fans
USA 67
UK 56
GERMANY 56
USA 55
UK 76
GERMANY 43
USA 51
GERMANY 48
此数据帧称为music_fans
。如何根据每个国家的粉丝总数添加一列,其中第三列看起来是这样的:
origin fans total_fans
USA 67 173
UK 56 183
GERMANY 56 147
USA 55 173
UK 76 183
GERMANY 43 147
USA 51 173
UK 51 183
GERMANY 48 147
可以得到dplyr
组的和:
library(dplyr)
music_fans %>%
group_by(origin) %>%
mutate(total_fans = sum(fans, na.rm = TRUE))
origin fans total_fans
<chr> <int> <int>
1 USA 67 173
2 UK 56 183
3 GERMANY 56 147
4 USA 55 173
5 UK 76 183
6 GERMANY 43 147
7 USA 51 173
8 UK 51 183
9 GERMANY 48 147
或以R为底:
music_fans$total_fans <- ave(music_fans$fans, music_fans$origin, FUN = sum, na.rm = T)
music_fans <- structure(list(origin = c("USA", "UK", "GERMANY", "USA", "UK",
"GERMANY", "USA", "UK", "GERMANY"), fans = c(67L, 56L, 56L, 55L, 76L,
43L, 51L, 51L, 48L)), class = "data.frame", row.names = c(NA, -9L))
下面是数据表方法:
setDT(df)[, .(total_fans = sum(fans)), by = 'origin'] %>%
left_join(df, by = 'origin')