我正在处理这些数据,其中学生列表(ID(链接到他们最喜欢的运动,这些运动只能从7个不同的运动中选择。对于一个唯一的ID,可能有不止一项最喜欢的运动。以下是它的快照。
ID Sports
1 Soccer
2 Basketball
3 Tennis
1 Basketball
4 Soccer
2 Hockey
3 Basketball
5 Soccer
6 Rafting
2 surfing
1 Hockey
6 Soccer
7 Tennis
我需要创建一个数据,显示每个学生(ID(喜欢多少不同的运动,并显示这些运动,一些结果如下:
ID count All Favourite Sports
1 3 Soccer, Basketball,Hockey
2 3 Basketball,Hockey,surfing
3 2 Tennis, Basketball
4 1 Soccer
5 1 Soccer
6 2 Rafting, Soccer
7 1 Tennis
您可以使用dplyr
包和以下代码来完成此任务。请注意,data
应该是您问题中的数据帧的名称:
> data %>% group_by(ID) %>%
+ summarize(count = n_distinct(Sports),
+ all_sports = toString(Sports)) %>%
+ ungroup()
# A tibble: 7 x 3
ID count all_sports
<int> <int> <chr>
1 1 3 Soccer, Basketball, Hockey
2 2 3 Basketball, Hockey, surfing
3 3 2 Tennis, Basketball
4 4 1 Soccer
5 5 1 Soccer
6 6 2 Rafting, Soccer
7 7 1 Tennis
您可以尝试的另一种方法
library(dplyr)
df %>%
group_by(ID) %>%
transmute(ID, count = n(), `All Favourite Sports` = paste(unique(Sports), collapse = ", ")) %>%
slice(1) %>%
ungroup()
# ID count `All Favourite Sports`
# <int> <int> <chr>
# 1 1 3 Soccer, Basketball, Hockey
# 2 2 3 Basketball, Hockey, surfing
# 3 3 2 Tennis, Basketball
# 4 4 1 Soccer
# 5 5 1 Soccer
# 6 6 2 Rafting, Soccer
# 7 7 1 Tennis