聚合R中列的不同类别的数据



我正在处理这些数据,其中学生列表(ID(链接到他们最喜欢的运动,这些运动只能从7个不同的运动中选择。对于一个唯一的ID,可能有不止一项最喜欢的运动。以下是它的快照。

ID    Sports
1    Soccer
2    Basketball
3    Tennis
1    Basketball
4    Soccer
2    Hockey
3    Basketball
5    Soccer
6    Rafting
2    surfing
1    Hockey
6    Soccer
7    Tennis

我需要创建一个数据,显示每个学生(ID(喜欢多少不同的运动,并显示这些运动,一些结果如下:

ID           count                  All Favourite Sports
1               3                   Soccer, Basketball,Hockey 
2               3                   Basketball,Hockey,surfing
3               2                   Tennis, Basketball
4               1                   Soccer  
5               1                   Soccer
6               2                   Rafting, Soccer
7               1                   Tennis

您可以使用dplyr包和以下代码来完成此任务。请注意,data应该是您问题中的数据帧的名称:

> data %>% group_by(ID) %>% 
+     summarize(count = n_distinct(Sports),
+               all_sports = toString(Sports)) %>%
+     ungroup()
# A tibble: 7 x 3
ID count all_sports                 
<int> <int> <chr>                      
1     1     3 Soccer, Basketball, Hockey 
2     2     3 Basketball, Hockey, surfing
3     3     2 Tennis, Basketball         
4     4     1 Soccer                     
5     5     1 Soccer                     
6     6     2 Rafting, Soccer            
7     7     1 Tennis    

您可以尝试的另一种方法

library(dplyr)
df %>% 
group_by(ID) %>% 
transmute(ID, count = n(), `All Favourite Sports` = paste(unique(Sports), collapse = ", ")) %>% 
slice(1) %>% 
ungroup()
#       ID count `All Favourite Sports`     
#     <int> <int> <chr>                      
# 1     1     3   Soccer, Basketball, Hockey 
# 2     2     3   Basketball, Hockey, surfing
# 3     3     2   Tennis, Basketball         
# 4     4     1   Soccer                     
# 5     5     1   Soccer                     
# 6     6     2   Rafting, Soccer            
# 7     7     1   Tennis  

相关内容

最新更新