R有条件地修改特定组的行

  • 本文关键字:有条件 修改 r dplyr
  • 更新时间 :
  • 英文 :


我想为满足另一个条件的特定组更改满足某个条件的某些行。

目的:

例如,我试图从下面的数据集中提取母亲的名字,并将其应用于标记为"儿童"的行旁边,仅用于在下面的数据集中具有所有"一夫一妻制"的组:

df <- tribble(
~family, ~sequence, ~role,     ~state,        ~name,     ~year_of_birth,
"A",      1,       "Father",  "Monogamist",  "Adam",     1980,
"A",      2,       "Mother",  "Monogamist",  "Sarah",    1981,
"A",      3,       "Child",   "Monogamist",  "Omar",     2000,
"A",      4,       "Child",   "Monogamist",  "Joseph",   2001,
"B",      1,       "Father",  "Polygamist",  "Ali",      1990,
"B",      2,       "Mother",  "Polygamist",  "Miriam",   1998,
"B",      2,       "Child",   "Polygamist",  "Noah",     1992,
"B",      3,       "Child",   "Polygamist",  "Jacob",    1998,
"B",      4,       "Child",   "Polygamist",  "Layla",    2014,
"C",      1,       "Father",  "Widower",     "Ibrahim",  2020,
"C",      3,       "Child",   "Widower",     "Zakariya", 2021,
"C",      4,       "Child",    "Widower",     "Kahlid",  2022,
)

问题:

问题是有一个妻子或没有妻子的情况较多,导致我的尝试/尝试失败。

我尝试:

我的尝试是有条件地改变这样的行链接:


# Not-working
df %>%
group_by(family) %>% 
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ "")) 
Error in `mutate()`:
! Problem while computing `mother_name = case_when(...)`.
i The error occurred in group 2: family = "B".
Caused by error in `case_when()`:
! `!sequence %in% c(1, 2) ~ name[sequence == 2]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
# Working
df %>%
group_by(family) %>% 
filter(any(state == "Monogamist")) %>% 
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ "")) 
# A tibble: 4 x 7
# Groups:   family [1]
family sequence role   state      name   year_of_birth mother_name
<chr>     <dbl> <chr>  <chr>      <chr>          <dbl> <chr>      
1 A             1 Father Monogamist Adam            1980 ""         
2 A             2 Mother Monogamist Sarah           1981 ""         
3 A             3 Child  Monogamist Omar            2000 "Sarah"    
4 A             4 Child  Monogamist Joseph          2001 "Sarah"      

预期输出

如何实现以下输出。把母亲的名字和出生年份按升序排列就好了。将any(state == "Monogamist")条件添加到case_when条件中是使我卡住的原因。

family sequence   role      state     name year_of_birth    mother_name
1       A        1 Father Monogamist     Adam          1980             NA
2       A        2 Mother Monogamist    Sarah          1981          Sarah
3       A        3  Child Monogamist     Omar          2000             NA
4       A        4  Child Monogamist   Joseph          2001             NA
5       B        1 Father Polygamist      Ali          1990             NA
6       B        2 Mother Polygamist   Miriam          1998 Fatima, Miriam
7       B        2  Child Polygamist   Fatima          1992 Fatima, Miriam
8       B        3  Child Polygamist    Jacob          1998             NA
9       B        4  Child Polygamist    Layla          2014             NA
10      C        1 Father    Widower  Ibrahim          2020             NA
11      C        3  Child    Widower Zakariya          2021             NA
12      C        4  Child    Widower   Kahlid          2022             NA

解决方案如下:

  • 提取母亲的名字
  • 应用于标记为'children'的行
  • 处理发生的更多的妻子或没有
  • 按出生分组对数据进行排序。

使用role:的解决方案

df |>
group_by(family) |>
arrange(year_of_birth) |>
mutate(mother_name = ifelse(role == "Child" & !is_empty(name[role == "Mother"]),
paste(name[role == "Mother"], collapse = ", "),
NA))

或使用sequence:

df |>
group_by(family) |>
arrange(family, year_of_birth) |>
mutate(mother_name = ifelse(sequence > 2 & !is_empty(name[sequence == 2]),
paste(name[sequence == 2], collapse=", "),
NA)) |>
ungroup()

输出:

# A tibble: 12 × 7
family sequence role   state      name     year_of_birth mother_name   
<chr>     <dbl> <chr>  <chr>      <chr>            <dbl> <chr>         
1 A             1 Father Monogamist Adam              1980 NA            
2 A             2 Mother Monogamist Sarah             1981 NA            
3 A             3 Child  Monogamist Omar              2000 Sarah         
4 A             4 Child  Monogamist Joseph            2001 Sarah         
5 B             1 Father Polygamist Ali               1990 NA            
6 B             2 Mother Polygamist Fatima            1992 NA            
7 B             2 Mother Polygamist Miriam            1998 NA            
8 B             3 Child  Polygamist Jacob             1998 Fatima, Miriam
9 B             4 Child  Polygamist Layla             2014 Fatima, Miriam
10 C             1 Father Widower    Ibrahim           2020 NA            
11 C             3 Child  Widower    Zakariya          2021 NA            
12 C             4 Child  Widower    Kahlid            2022 NA            

根据您的指示进行新数据:

df <- tribble(
~family, ~sequence, ~role,     ~state,        ~name,
"A",      1,       "Father",  "Monogamist",  "Adam",
"A",      2,       "Mother",  "Monogamist",  "Sarah",
"A",      3,       "Child",   "Monogamist",  "Omar",
"A",      4,       "Child",   "Monogamist",  "Joseph",
"B",      1,       "Father",  "Polygamist",  "Ali",
"B",      2,       "Mother",  "Polygamist",  "Miriam",
"B",      2,       "Mother",   "Polygamist", "Fatima",
"B",      3,       "Child",   "Polygamist",  "Jacob",
"B",      4,       "Child",   "Polygamist",  "Layla",
"C",      1,       "Father",  "Widower",     "Ibrahim",
"C",      3,       "Child",   "Widower",     "Zakariya",
"C",      4,       "Child",    "Widower",     "Kahlid"
) |> add_column(year_of_birth = c(1980, 1981, 2000, 2001, 1990, 1998, 1992, 1998, 2014, 2020, 2021, 2022))

最新更新