对于我的一项分析,我想评估不同受试者(id(在一定持续时间(nmp_time(内的血红蛋白水平(art_hb(过程。我们想根据其首次测量结果,将id分为不同类别的血红蛋白水平(0-3;3-6;6-9和>9(。因此,如果血红蛋白水平随着时间的推移而变化,我们不希望它改变类别。
我的示例数据:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA))
到目前为止,我已经成功地创建了基于时间==0时血红蛋白测量的类别df$art_hb_cat <- ifelse(df$art_hb < 3 & df$time == 0, "0-3", ifelse(df$art_hb >= 3 & df$art_hb < 6 & df$time == 0, "3-6", ifelse(df$art_hb >= 6 & df$art_hb < 9 & df$time == 0, "6-9", ifelse(df$art_hb > 9 & df$time == 0, ">9", ""))))
导致:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","","","6-9","","")
现在,我想为id(->group_by(id((复制这些类别,以得到如下df:df <- data.frame(id=factor(c(1,1,1,2,2,2)), time=c(0,30,60,0,30,60), art_hb=c(5.8,6.1,5.9,6.7,6.9,NA)), art_hb_cat=c("3-6","3-6","3-6","6-9","6-9","6-9")
但经过几天的努力,我还是没能成功。有人能帮我吗?非常非常感谢。
附言:这是我的第一篇帖子,所以我希望这篇文章足够清晰。很抱歉
我们可以在按"id"分组后使用case_when
library(dplyr)
df %>%
group_by(id) %>%
mutate(art_hb_cat = case_when(art_hb < 3 & time == 0 ~ "0-3", art_hb >=3 & art_hb <6 & time == 0 ~ "3-6", art_hb>=6 & art_hb < 9 & time == 0 ~ "6-9", art_hb > 9 & time == 0 ~ ">9")[1]) %>%
ungroup
-输出
# A tibble: 6 × 4
id time art_hb art_hb_cat
<fct> <dbl> <dbl> <chr>
1 1 0 5.8 3-6
2 1 30 6.1 3-6
3 1 60 5.9 3-6
4 2 0 6.7 6-9
5 2 30 6.9 6-9
6 2 60 NA 6-9
或使用data.table
library(data.table)
setDT(df)[df[time == 0, .(id, art_hb_cat = fcase(between(art_hb, 0,
3), "0-3", between(art_hb, 3, 6), "3-6", between(art_hb, 6, 9),
"6-9", default = ">9"))], on = .(id)]
id time art_hb art_hb_cat
<fctr> <num> <num> <char>
1: 1 0 5.8 3-6
2: 1 30 6.1 3-6
3: 1 60 5.9 3-6
4: 2 0 6.7 6-9
5: 2 30 6.9 6-9
6: 2 60 NA 6-9
您可以使用cut
而不是ifelse
,并且当time == 0
用于id
的每个group
时将其应用于art_hb
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(art_hb_cat = cut(art_hb[time == 0],
breaks = c(0, 3, 6, 9, Inf),
labels = c("0-3", "3-6", "6-9", ">9")))
id time art_hb art_hb_cat
<fct> <dbl> <dbl> <fct>
1 1 0 5.8 3-6
2 1 30 6.1 3-6
3 1 60 5.9 3-6
4 2 0 6.7 6-9
5 2 30 6.9 6-9
6 2 60 NA 6-9