我想改变所有的性别条目'男性,女性,女性,男性,男性等'更加一致,所以它只有3个元素(男性,女性和非二进制)。这是我当前的代码
# Cleaning of Specific Variable Types
removed <- removed %>%
mutate(gender=substr(toupper(gender), 1, 1))
removed <- removed %>%
mutate(gender=case_when(
gender == "M"~"Male",
gender == "F"~"Female",
gender == "N"~"Non-binary")
)
这可能是较长的版本,但它应该可以工作:数据来自杰伊。Sf(多谢)
- 首字母大写
- 检查
gender
中的唯一条目 - 为每个类别创建模式
- 用
str_detect
和模式应用case_when
条件:
# Capitalize each value to avoid interaction of "man" and "woman" in str_detect
# check for unique elements in `gender`
removed$gender <- str_to_title(removed$gender)
unique(removed$gender)
[1] "Male" "Woman" "Other" "Mtf" "Female"
[6] "Man" "Ftm" "Androgyne"
# define pattern for each category
Male <- paste(c("Male", "Man"), collapse = "|")
Female <- paste(c("Woman", "Female"), collapse = "|")
Non_binary <- paste(c("Other", "Mtf", "Ftm", "Androgyne"), collapse= "|")
# apply category with `case_when` and pattern:
library(dplyr)
library(stringr)
removed %>%
mutate(gender = case_when(
str_detect(gender, Male) ~ "Male",
str_detect(gender, Female) ~ "Female",
str_detect(gender, Non_binary) ~ "Non-binary"))
输出:
gender
1 Male
2 Female
3 Male
4 Non-binary
5 Non-binary
6 Female
7 Male
8 Non-binary
9 Male
10 Male
11 Male
12 Female
13 Non-binary
14 Female
15 Female
16 Non-binary
17 Male
18 Female
19 Non-binary
20 Non-binary
21 Non-binary
22 Female
23 Female
24 Female
25 Female
26 Male
27 Non-binary
28 Male
29 Female
30 Non-binary
问题似乎与gender
的默认值有关。使用TRUE
,而不是与"N"
匹配。
使用jay中的数据进行测试。科幻的答案。
library(dplyr)
removed %>%
mutate(
gender = toupper(substr(gender, 1, 1)),
gender = case_when(
gender == "M" ~ "Male",
gender %in% c("F", "W") ~ "Female",
TRUE ~ "Non-binary"
))
您可能有这样的数据帧。
removed
# gender
# 1 Male
# 2 Woman
# 3 Male
# 4 other
# 5 MtF
# 6 female
# ...
您现在可以像这样以半自动化的方式创建键表。
key <- data.frame(x=sort(unique(tolower(removed$gender))),
y=factor(c(3, 1, 3, 2, 2, 3, 3, 1),
labels=c('female', 'male', 'non-binary')))
然后使用match
替换标签
library(dplyr)
removed %>%
mutate(gender=key$y[match(tolower(gender), key$x)])
# gender
# 1 male
# 2 female
# 3 male
# 4 non-binary
# 5 non-binary
# 6 female
# 7 ...
数据removed <- structure(list(gender = c("Male", "Woman", "Male", "other", "MtF",
"female", "male", "MtF", "Male", "man", "Man", "female", "other",
"Woman", "female", "MtF", "male", "Female", "other", "other",
"FtM", "female", "Woman", "Woman", "female", "male", "androgyne",
"man", "Female", "MtF")), class = "data.frame", row.names = c(NA,
-30L))