我想为满足另一个条件的特定组更改满足某个条件的某些行。
目的:
例如,我试图从下面的数据集中提取母亲的名字,并将其应用于标记为"儿童"的行旁边,仅用于在下面的数据集中具有所有"一夫一妻制"的组:
df <- tribble(
~family, ~sequence, ~role, ~state, ~name, ~year_of_birth,
"A", 1, "Father", "Monogamist", "Adam", 1980,
"A", 2, "Mother", "Monogamist", "Sarah", 1981,
"A", 3, "Child", "Monogamist", "Omar", 2000,
"A", 4, "Child", "Monogamist", "Joseph", 2001,
"B", 1, "Father", "Polygamist", "Ali", 1990,
"B", 2, "Mother", "Polygamist", "Miriam", 1998,
"B", 2, "Child", "Polygamist", "Noah", 1992,
"B", 3, "Child", "Polygamist", "Jacob", 1998,
"B", 4, "Child", "Polygamist", "Layla", 2014,
"C", 1, "Father", "Widower", "Ibrahim", 2020,
"C", 3, "Child", "Widower", "Zakariya", 2021,
"C", 4, "Child", "Widower", "Kahlid", 2022,
)
问题:
问题是有一个妻子或没有妻子的情况较多,导致我的尝试/尝试失败。
我尝试:
我的尝试是有条件地改变这样的行链接:
# Not-working
df %>%
group_by(family) %>%
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ ""))
Error in `mutate()`:
! Problem while computing `mother_name = case_when(...)`.
i The error occurred in group 2: family = "B".
Caused by error in `case_when()`:
! `!sequence %in% c(1, 2) ~ name[sequence == 2]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
# Working
df %>%
group_by(family) %>%
filter(any(state == "Monogamist")) %>%
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ ""))
# A tibble: 4 x 7
# Groups: family [1]
family sequence role state name year_of_birth mother_name
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 A 1 Father Monogamist Adam 1980 ""
2 A 2 Mother Monogamist Sarah 1981 ""
3 A 3 Child Monogamist Omar 2000 "Sarah"
4 A 4 Child Monogamist Joseph 2001 "Sarah"
预期输出
如何实现以下输出。把母亲的名字和出生年份按升序排列就好了。将any(state == "Monogamist")
条件添加到case_when
条件中是使我卡住的原因。
family sequence role state name year_of_birth mother_name
1 A 1 Father Monogamist Adam 1980 NA
2 A 2 Mother Monogamist Sarah 1981 Sarah
3 A 3 Child Monogamist Omar 2000 NA
4 A 4 Child Monogamist Joseph 2001 NA
5 B 1 Father Polygamist Ali 1990 NA
6 B 2 Mother Polygamist Miriam 1998 Fatima, Miriam
7 B 2 Child Polygamist Fatima 1992 Fatima, Miriam
8 B 3 Child Polygamist Jacob 1998 NA
9 B 4 Child Polygamist Layla 2014 NA
10 C 1 Father Widower Ibrahim 2020 NA
11 C 3 Child Widower Zakariya 2021 NA
12 C 4 Child Widower Kahlid 2022 NA
解决方案如下:
- 提取母亲的名字
- 应用于标记为'children'的行
- 处理发生的更多的妻子或没有
- 按出生分组对数据进行排序。
使用role
:的解决方案
df |>
group_by(family) |>
arrange(year_of_birth) |>
mutate(mother_name = ifelse(role == "Child" & !is_empty(name[role == "Mother"]),
paste(name[role == "Mother"], collapse = ", "),
NA))
或使用sequence
:
df |>
group_by(family) |>
arrange(family, year_of_birth) |>
mutate(mother_name = ifelse(sequence > 2 & !is_empty(name[sequence == 2]),
paste(name[sequence == 2], collapse=", "),
NA)) |>
ungroup()
输出:
# A tibble: 12 × 7
family sequence role state name year_of_birth mother_name
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 A 1 Father Monogamist Adam 1980 NA
2 A 2 Mother Monogamist Sarah 1981 NA
3 A 3 Child Monogamist Omar 2000 Sarah
4 A 4 Child Monogamist Joseph 2001 Sarah
5 B 1 Father Polygamist Ali 1990 NA
6 B 2 Mother Polygamist Fatima 1992 NA
7 B 2 Mother Polygamist Miriam 1998 NA
8 B 3 Child Polygamist Jacob 1998 Fatima, Miriam
9 B 4 Child Polygamist Layla 2014 Fatima, Miriam
10 C 1 Father Widower Ibrahim 2020 NA
11 C 3 Child Widower Zakariya 2021 NA
12 C 4 Child Widower Kahlid 2022 NA
根据您的指示进行新数据:
df <- tribble(
~family, ~sequence, ~role, ~state, ~name,
"A", 1, "Father", "Monogamist", "Adam",
"A", 2, "Mother", "Monogamist", "Sarah",
"A", 3, "Child", "Monogamist", "Omar",
"A", 4, "Child", "Monogamist", "Joseph",
"B", 1, "Father", "Polygamist", "Ali",
"B", 2, "Mother", "Polygamist", "Miriam",
"B", 2, "Mother", "Polygamist", "Fatima",
"B", 3, "Child", "Polygamist", "Jacob",
"B", 4, "Child", "Polygamist", "Layla",
"C", 1, "Father", "Widower", "Ibrahim",
"C", 3, "Child", "Widower", "Zakariya",
"C", 4, "Child", "Widower", "Kahlid"
) |> add_column(year_of_birth = c(1980, 1981, 2000, 2001, 1990, 1998, 1992, 1998, 2014, 2020, 2021, 2022))