这是我的数据集:
X Totally.Disagree Disagree Agree Totally.agree
0 2 9 111 122
1 2 30 124 88
2 4 31 119 90
3 10 43 138 53
4 33 54 85 72
5 43 79 89 33
6 48 83 94 19
7 51 98 80 15
8 50 102 75 17
9 51 96 80 17
其中x(因此每行)是一个问题,值是选择此问题答案的人数。我想计算每个问题的模式(最选择的答案)。
这是我尝试的:
df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>%
group_by(X, Answer) %>%
summarise(sum = count)%>%
summarise(mode = df$Answer[which(df$count== max(df$count))])
但是它不起作用,因为max(df$count)
是指整个数据集,而不仅仅是一个问题。
如果我尝试的方式是正确的,那么我现在不这样做。如果你们中的一个可以帮助我解决这个问题,我将非常感激。
可能是:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode = names(.)[[mode]]) %>%
select(X, mode)
X mode
<int> <chr>
1 0 Totally.agree
2 1 Agree
3 2 Agree
4 3 Agree
5 4 Agree
6 5 Agree
7 6 Agree
8 7 Disagree
9 8 Disagree
10 9 Disagree
首先,它以最大的计数标识列的索引,然后根据列索引分配列的名称。
,如果您还要包括数字,则可以尝试:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode_names = names(.)[[mode]],
mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
select(X, mode_names, mode_numbers)
X mode_names mode_numbers
<int> <chr> <dbl>
1 0 Totally.agree 122.
2 1 Agree 124.
3 2 Agree 119.
4 3 Agree 138.
5 4 Agree 85.
6 5 Agree 89.
7 6 Agree 94.
8 7 Disagree 98.
9 8 Disagree 102.
10 9 Disagree 96.
或遵循您的原始逻辑:
df %>%
gather(mode_names, mode_numbers, -X) %>%
group_by(X) %>%
filter(mode_numbers == max(mode_numbers)) %>%
arrange(X)
X mode_names mode_numbers
<int> <chr> <int>
1 0 Totally.agree 122
2 1 Agree 124
3 2 Agree 119
4 3 Agree 138
5 4 Agree 85
6 5 Agree 89
7 6 Agree 94
8 7 Disagree 98
9 8 Disagree 102
10 9 Disagree 96
如果仅想要答案本身(没有数字),我们可以假设没有关系,则
df <- gather(df, Answer, count, Totally.Disagree:Totally.agree)
df %>% group_by(X) %>% summarise(mode = Answer[which.max(count)])
# A tibble: 10 x 2
# X mode
# <int> <chr>
# 1 0 Totally.agree
# 2 1 Agree
# 3 2 Agree
# 4 3 Agree
# 5 4 Agree
# 6 5 Agree
# 7 6 Agree
# 8 7 Disagree
# 9 8 Disagree
# 10 9 Disagree
Answer[which.max(count)]
基本上是您打算做的,但是不需要df$
,因为您希望按组完成这些计算。