我有一个大的数据帧,其中三个变量具有以下结构:
author1_gender <- c("Men", "Men", "Women")
author2_gender <- c("Women", "Men", "Women")
author3_gender <- c("Men", "Men", "Women")
genders <- tibble(author1_gender, author2_gender, author3_gender)
它产生
# A tibble: 3 × 3
author1_gender author2_gender author3_gender
<chr> <chr> <chr>
1 Men Women Men
2 Men Men Men
3 Women Women Women
我希望根据行中是否存在混合性别来创建一个新列,即每行中的三个值是否相等。理想情况下,我希望在三列中添加一列,表明是否只有女性、只有男性或混合性别,即
# A tibble: 3 × 4
author1_gender author2_gender author3_gender gender_mix
<chr> <chr> <chr> <chr>
1 Men Women Men mix
2 Men Men Men men
3 Women Women Women women
如果我有两个值,我可以用identital()
来做这件事,但我似乎找不到如何用三个值来做。有人能帮我解决这个可能很琐碎的问题吗?
您可以在名称以"gender"结尾的列中找到每行的最小值和最大值,如果最小值等于最大值,则返回最大值,否则返回"mix"。
library(dplyr, warn.conflicts = FALSE)
author1_gender <- c("Men", "Men", "Women")
author2_gender <- c("Women", "Men", "Women")
author3_gender <- c("Men", "Men", "Women")
genders <- tibble(author1_gender, author2_gender, author3_gender)
genders %>%
mutate(
gender_mix =
lapply(c(pmax, pmin), do.call, across(ends_with('gender'))) %>%
{if_else(Reduce('==', .), .[[1]], 'mix')}
)
#> # A tibble: 3 × 4
#> author1_gender author2_gender author3_gender gender_mix
#> <chr> <chr> <chr> <chr>
#> 1 Men Women Men mix
#> 2 Men Men Men Men
#> 3 Women Women Women Women
创建于2021-12-07由reprex包(v2.0.1(
如果您有NA,您可以将na.rm = TRUE
参数添加到pmin
和pmax
genders %>%
mutate(
gender_mix =
lapply(c(pmax, pmin), do.call,
c(across(ends_with('gender')), na.rm = TRUE)) %>%
{if_else(Reduce('==', .), .[[1]], 'mix')}
)
genders %>% mutate(gender_mix=ifelse(pmin(author1_gender, author2_gender, author3_gender)==pmax(author1_gender, author2_gender, author3_gender),author1_gender, "mix"))
# A tibble: 3 x 4
author1_gender author2_gender author3_gender gender_mix
<chr> <chr> <chr> <chr>
1 Men Women Men MIX
2 Men Men Men Men
3 Women Women Women Women