r语言 - 如何减少对相同值的多个值的重新编码



如何减少重复写入类似的值

data <- data.frame(id=c(1,2,3,4,5,6,7,8,9,10,11,12,13),
value=c("1c","3d", "1e","1f","1g", "4h", "1j", 
"2f", "2c",  "2d", "2j", "2e", "5i"))

我使用的代码是:

library(tidyverse)
data %>% 
mutate(category = recode(value, "1c"=1,"3d"=1, "1e"=1,"1f"=1,"1g"=1, "4h"=1, "1j"=1, 
"2f"=2, "2c"=2,  "2d"=1, "2j"=1, "2e"=2, "5i"=2))

我想用ifelse(c(), 1, 2)

将一些被编码为1的值分组在一组中,另一组在另一组中

您可以使用substr

subs <- substr(dat$value, 1, 1)
dat$group <- ifelse(subs == 1 | subs == 4, 1, 2)
dat
#    id value group
# 1   1    1c     1
# 2   2    3d     2
# 3   3    1e     1
# 4   4    1f     1
# 5   5    1g     1
# 6   6    4h     1
# 7   7    1j     1
# 8   8    2f     2
# 9   9    2c     2
# 10 10    2d     2
# 11 11    2j     2
# 12 12    2e     2
# 13 13    5i     2

注意如果存在NA,而不存在%in%,那么用==|进行比较也是可靠的。示范:

dat$value[3] <- NA
subs2 <- substr(dat$value, 1, 1)
ifelse(subs2 == 1 | subs2 == 4, 1, 2)
[1]  1  2 NA  1  1  1  1  2  2  2  2  2  2

而:

ifelse(subs2 %in% c(1, 4), 1, 2)
[1] 1 2 2 1 1 1 1 2 2 2 2 2 2

最新更新