r语言 - 使用dplyr计算基于另一列的出现次数



这是一个表格:

origin    fans
USA        67
UK         56
GERMANY    56
USA        55
UK         76
GERMANY    43
USA        51
GERMANY    48

此数据帧称为music_fans。如何根据每个国家的粉丝总数添加一列,其中第三列看起来是这样的:

origin    fans  total_fans
USA        67   173
UK         56    183
GERMANY    56    147
USA        55    173
UK         76    183
GERMANY    43    147
USA        51    173
UK         51    183
GERMANY    48    147

可以得到dplyr组的和:

library(dplyr)
music_fans %>%
group_by(origin) %>%
mutate(total_fans = sum(fans, na.rm = TRUE))

origin   fans total_fans
<chr>   <int>      <int>
1 USA        67        173
2 UK         56        183
3 GERMANY    56        147
4 USA        55        173
5 UK         76        183
6 GERMANY    43        147
7 USA        51        173
8 UK         51        183
9 GERMANY    48        147

或以R为底:

music_fans$total_fans <- ave(music_fans$fans, music_fans$origin, FUN = sum, na.rm = T)

music_fans <- structure(list(origin = c("USA", "UK", "GERMANY", "USA", "UK", 
"GERMANY", "USA", "UK", "GERMANY"), fans = c(67L, 56L, 56L, 55L, 76L, 
43L, 51L, 51L, 48L)), class = "data.frame", row.names = c(NA, -9L)) 

下面是数据表方法:

setDT(df)[, .(total_fans = sum(fans)), by = 'origin'] %>% 
left_join(df, by = 'origin')

最新更新