如何重复在一个起点确定的分类变量,并在纵向数据的持续时间内继续



对于我的一项分析,我想评估不同受试者(id(在一定持续时间(nmp_time(内的血红蛋白水平(art_hb(过程。我们想根据其首次测量结果,将id分为不同类别的血红蛋白水平(0-3;3-6;6-9和>9(。因此,如果血红蛋白水平随着时间的推移而变化,我们不希望它改变类别。

我的示例数据:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA))

到目前为止,我已经成功地创建了基于时间==0时血红蛋白测量的类别df$art_hb_cat <- ifelse(df$art_hb < 3 & df$time == 0, "0-3", ifelse(df$art_hb >= 3 & df$art_hb < 6 & df$time == 0, "3-6", ifelse(df$art_hb >= 6 & df$art_hb < 9 & df$time == 0, "6-9", ifelse(df$art_hb > 9 & df$time == 0, ">9", ""))))

导致:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","","","6-9","","")

现在,我想为id(->group_by(id((复制这些类别,以得到如下df:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","3-6","3-6","6-9","6-9","6-9")

但经过几天的努力,我还是没能成功。有人能帮我吗?非常非常感谢。

附言:这是我的第一篇帖子,所以我希望这篇文章足够清晰。很抱歉

我们可以在按"id"分组后使用case_when

library(dplyr)
df %>% 
group_by(id) %>%
mutate(art_hb_cat = case_when(art_hb < 3 & time == 0 ~ "0-3", art_hb >=3 & art_hb <6 & time == 0 ~ "3-6", art_hb>=6 & art_hb < 9 & time  == 0 ~ "6-9", art_hb > 9 & time == 0 ~ ">9")[1]) %>%
ungroup

-输出

# A tibble: 6 × 4
id     time art_hb art_hb_cat
<fct> <dbl>  <dbl> <chr>     
1 1         0    5.8 3-6       
2 1        30    6.1 3-6       
3 1        60    5.9 3-6       
4 2         0    6.7 6-9       
5 2        30    6.9 6-9       
6 2        60   NA   6-9   

或使用data.table

library(data.table)
setDT(df)[df[time == 0, .(id, art_hb_cat = fcase(between(art_hb, 0,
3), "0-3", between(art_hb, 3, 6), "3-6", between(art_hb, 6, 9), 
"6-9", default = ">9"))], on = .(id)]
id  time art_hb art_hb_cat
<fctr> <num>  <num>     <char>
1:      1     0    5.8        3-6
2:      1    30    6.1        3-6
3:      1    60    5.9        3-6
4:      2     0    6.7        6-9
5:      2    30    6.9        6-9
6:      2    60     NA        6-9

您可以使用cut而不是ifelse,并且当time == 0用于id的每个group时将其应用于art_hb

library(dplyr)
df %>% 
group_by(id) %>% 
mutate(art_hb_cat = cut(art_hb[time == 0],
breaks = c(0, 3, 6, 9, Inf), 
labels = c("0-3", "3-6", "6-9", ">9")))
id     time art_hb art_hb_cat
<fct> <dbl>  <dbl> <fct>     
1 1         0    5.8 3-6       
2 1        30    6.1 3-6       
3 1        60    5.9 3-6       
4 2         0    6.7 6-9       
5 2        30    6.9 6-9       
6 2        60   NA   6-9       

最新更新