r-在使用dplyr::时如何对代码的一系列数字进行变异



我想问是否有人知道如何解决我的问题:

` master1<- master %>%
mutate(underweight=BMI<18.5,
normal=BMI%in%c(18.5,24.9),
overweight=BMI%in%c(25,29.9),
obese=BMI%in%c(30,34.9),
extreme=BMI>35) `

我想创建新类型的列。。对于体重不足和极端来说,这是有效的,但对于正常、超重和肥胖来说,我想我编码的数字范围不对。。。

%in%将只检查精确的值。由于这些都是浮点数,因此也可能不匹配。这里,我们可能需要比较运算符(<>&(或使用between

library(dplyr)
master1 <- master %>%
mutate(underweight = BMI < 18.5,
normal = between(BMI, 18.5, 24.9),
overweight = between(BMI, 25, 29.9),
obese = between(BMI, 30, 34.9),
extreme = BMI > 35))

我创建了一个可复制的示例(reprex(来帮助解释,并使用了包{dplyr}:

library(dplyr)
# Create data frame for 34 subjects
# and add a column with random numbers for weight between 15 and 34
master1 <- data.frame(
subjects = sample(1:34)
) %>% 
mutate(BMI = runif(34, min = 15, max = 34),
BMI = round(BMI, digits = 1)) # rounds to 1 digit

当你重新编码时,最好重新编码到一列,这就是所谓的长格式数据:

# Create a column called category with the labels for the weight groups
master1 <- master1 %>% 
category = case_when(BMI <=  18.5 ~ "underweight",
BMI >= 18.5 & BMI <= 24.9 ~ "normal",
BMI >= 25 & BMI <= 29.9 ~ "overweight",
BMI >= 30 & BMI <= 34.9 ~ "obese",
BMI >= 35 ~ "extreme",
TRUE ~ NA_character_ # Just in case anything is # outside of these numbers this will return an NA which means nothing was entered
))

相关内容

最新更新