r语言 - 我如何将所有不同的条目分类为男性,女性和非二进制

  • 本文关键字:分类 二进制 r语言 r dataframe
  • 更新时间 :
  • 英文 :


我想改变所有的性别条目'男性,女性,女性,男性,男性等'更加一致,所以它只有3个元素(男性,女性和非二进制)。这是我当前的代码

# Cleaning of Specific Variable Types
removed <- removed %>%
mutate(gender=substr(toupper(gender), 1, 1))
removed <- removed %>% 
mutate(gender=case_when(
gender == "M"~"Male",
gender == "F"~"Female",
gender == "N"~"Non-binary")
)

这可能是较长的版本,但它应该可以工作:数据来自杰伊。Sf(多谢)

  1. 首字母大写
  2. 检查gender
  3. 中的唯一条目
  4. 为每个类别创建模式
  5. str_detect和模式应用case_when条件:
# Capitalize each value to avoid interaction of "man" and "woman" in str_detect
# check for unique elements in `gender`
removed$gender <- str_to_title(removed$gender)
unique(removed$gender)  
[1] "Male"      "Woman"     "Other"     "Mtf"       "Female"   
[6] "Man"       "Ftm"       "Androgyne"
# define pattern for each category
Male <- paste(c("Male", "Man"), collapse = "|")
Female <- paste(c("Woman", "Female"), collapse = "|")
Non_binary <- paste(c("Other", "Mtf", "Ftm", "Androgyne"), collapse= "|")
# apply category with `case_when` and pattern:
library(dplyr)
library(stringr)
removed %>% 
mutate(gender = case_when(
str_detect(gender, Male) ~ "Male",
str_detect(gender, Female) ~ "Female",
str_detect(gender, Non_binary) ~ "Non-binary"))

输出:

gender
1        Male
2      Female
3        Male
4  Non-binary
5  Non-binary
6      Female
7        Male
8  Non-binary
9        Male
10       Male
11       Male
12     Female
13 Non-binary
14     Female
15     Female
16 Non-binary
17       Male
18     Female
19 Non-binary
20 Non-binary
21 Non-binary
22     Female
23     Female
24     Female
25     Female
26       Male
27 Non-binary
28       Male
29     Female
30 Non-binary

问题似乎与gender的默认值有关。使用TRUE,而不是与"N"匹配。
使用jay中的数据进行测试。科幻的答案。

library(dplyr)
removed %>%
mutate(
gender = toupper(substr(gender, 1, 1)),
gender = case_when(
gender == "M" ~ "Male",
gender %in% c("F", "W") ~ "Female",
TRUE ~ "Non-binary"
))

您可能有这样的数据帧。

removed
#   gender
# 1   Male
# 2  Woman
# 3   Male
# 4  other
# 5    MtF
# 6 female
# ...

您现在可以像这样以半自动化的方式创建键表。

key <- data.frame(x=sort(unique(tolower(removed$gender))),
y=factor(c(3, 1, 3, 2, 2, 3, 3, 1), 
labels=c('female', 'male', 'non-binary')))

然后使用match替换标签

library(dplyr)
removed %>% 
mutate(gender=key$y[match(tolower(gender), key$x)])
#        gender
# 1        male
# 2      female
# 3        male
# 4  non-binary
# 5  non-binary
# 6      female
# 7         ...
数据

removed <- structure(list(gender = c("Male", "Woman", "Male", "other", "MtF", 
"female", "male", "MtF", "Male", "man", "Man", "female", "other", 
"Woman", "female", "MtF", "male", "Female", "other", "other", 
"FtM", "female", "Woman", "Woman", "female", "male", "androgyne", 
"man", "Female", "MtF")), class = "data.frame", row.names = c(NA, 
-30L))

最新更新