这是我的输出:
structure(list(Students = c(300L, 1600L, 100L, 90L, 2000L, 200L,
300L, 340L, 1500L, 500L, 360L, 820L, 150L, 1380L, NA, 360L, 400L,
1000L, 1600L, 142L, 250L, 2000L), Students_Primary = c(150L,
NA, 100L, 90L, 800L, NA, NA, 150L, NA, 250L, 220L, 400L, NA,
750L, NA, NA, NA, 600L, NA, 142L, NA, 500L), Chinese_Spoken = c("Mandarin",
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin",
"Mandarin", "Mandarin", "Mandarin", "Cantonese", "Mandarin",
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin",
"Mandarin", "Both", "Mandarin", "Both"), Chinese_Written = c("Simplified",
"Traditional", "Simplified", "Traditional", "Both", "Traditional",
"Traditional", "Simplified", "Simplified", NA, "Traditional",
"Both", NA, "Both", "Both", "Simplified", "Both", "Traditional",
"Traditional", "Traditional", "Simplified", "Both")), class = "data.frame", row.names = c(NA,
-22L))
我想总结一下有多少学生使用不同的中文写作,所以我试着用下面的代码来做:
school %>%
select(Chinese_Written, Students) %>%
group_by(Chinese_Written) %>%
arrange(Chinese_Written) %>%
na.omit()
它吐出这个:
Chinese_Written Students
<chr> <int>
1 Both 2000
2 Both 820
3 Both 1380
4 Both 400
5 Both 2000
6 Simplified 300
7 Simplified 100
8 Simplified 340
9 Simplified 1500
10 Simplified 360
11 Simplified 250
12 Traditional 1600
13 Traditional 90
14 Traditional 200
15 Traditional 300
16 Traditional 360
17 Traditional 1000
18 Traditional 1600
19 Traditional 142
他们没有被组合在一起有什么原因吗?我想要所有的"both"、"simplified"one_answers"traditional">
group_by
单独不做任何事情,它使下面的命令按分组。所以你可以在Chinese_Written
summarise
来代替sum
变量Students
library(dplyr)
school %>%
group_by(Chinese_Written) %>%
summarise(Students = sum(Students,na.rm = TRUE))
# A tibble: 4 x 2
Chinese_Written Students
<chr> <int>
1 Both 6600
2 Simplified 2850
3 Traditional 5292
4 NA 650