我是R的新手,并试图弄清楚如何根据数据帧中另一个变量的频率创建一个新变量。我有很多观察值,并希望将它们按小(少于15个观察值),中(15-20 obs)和大(超过20 obs)分组,也就是说,我试图将class_size重新编码为有序变量。例如,如果我有以下数据:
df <- data.frame(student_id = c(A,B,C,D,E,F,G,H,I,J),
class_size = c(10,15,20,15,35,25,11,40,40,10))
我想得到以下结果:
student_id class_size new_class_size
A 10 small
B 15 medium
C 20 medium
D 15 small
E 35 large etc...
F
G
H
I
J
我看了看函数case_when,但它没有给我我想要的。我如何在R中重新编码class_size变量?
我们可以使用cut
和breaks
作为断点,labels
library(dplyr)
df <- df %>%
mutate(new_class_size = cut(class_size,
breaks = c(-Inf, 15, 20, Inf), labels = c("small", "medium", "large")))
与产出
df <- structure(list(student_id = c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J"), class_size = c(10, 15, 20, 15, 35, 25, 11, 40,
40, 10)), class = "data.frame", row.names = c(NA, -10L))