使用dplyr在R中的其他两列的基础上自定义变异新列

我的目标是创建一个新的df列，其值基于另外两列。我的数据集涉及一项研究的招聘。我想要一个专栏来定义一个人是否参与了特定的一轮研究，如果是，是他们的第一次参与，第二次，第三次，以此类推(最多8轮(。目前，我正在dplyr中使用mutate(case_when))并使用lag()进行尝试。然而，如果一个人错过了一轮研究，后来又回来了，它就不会起作用

person |  round  |  in_round  |
A        1           1
A        2           1
A        3           1
A        4           1
A        5           1
A        6           0
A        7           0
A        8           0
B        1           0
B        2           0
B        3           1
B        4           1
B        5           1
B        6           1
B        7           0
B        8           1

我需要的是一个单独的列，它为每个人使用round和in_round来生成以下内容：

person |  round  |  in_round  |  round_status
A        1           1         recruited
A        2           1        follow_up_1
A        3           1        follow_up_2
A        4           1        follow_up_3
A        5           1        follow_up_4
A        6           0           none
A        7           0           none
A        8           0           none
B        1           0           none
B        2           0           none
B        3           1         recruited
B        4           1        follow_up_1
B        5           1        follow_up_2
B        6           1        follow_up_3
B        7           0            none
B        8           1        follow_up_4

总结：

其中in_round == 0，round_status == "none"
第一次in_round == 1、round_status == "recruited"
随后的时间in_round == 1、round_status == "follow_up_X"(取决于个体所处的先前波的数量(

试试这个：

df %>% 
group_by(person) %>%
arrange(round) %>%
mutate(cum_round = cumsum(in_round),
round_status = case_when(
in_round == 0 ~ "none",
cum_round == 1 ~ "recruited",
TRUE ~ paste0("follow_up_", cum_round - 1)
))

相关内容

最新更新

热门标签：