我正在寻找dplyr
的解决方案。
假设我有这个数据帧p
id treatment age sex progression pfs
1 1 SSTR 31.3 0 0 15.6
2 2 SSTR 36.9 0 1 8.9
3 3 SSTR 44.6 1 1 25.5
每个患者(id
)接受治疗并随访至进展p$progression == 1
或无进展p$progression == 0
,p$pfs
为随访时间(月)。
我想把每一行分成多个区间,每12个月随访一次。
一些协变量不会在每一行中改变,而另一些则会改变。
p$id
p$treatment
p$sex
(和其他在my dataframe)
变化的协变量
p$age
应每增加一个新的时间间隔增加1(因为每个患者每12个月1岁)p$pfs
被分成两个新的协变量:p$start
和p$stop
,表明12个月的间隔- 创建一个新的协变量
p$interval
来表示间隔数(0 - 12个月为interval == 1
, 12 - 24个月为interval == 2
, 24 - 36个月为interval == 3
,等等)
预期输出
id treatment sex age progression start stop interval
1 1 SSTR 0 31.3 0 0 12.0 1
2 1 SSTR 0 32.3 0 12 15.6 2
3 2 SSTR 0 36.9 1 0 8.9 1
4 3 SSTR 1 44.6 0 0 12.0 1
5 3 SSTR 1 45.6 0 12 24.0 2
6 3 SSTR 1 46.6 1 24 25.5 3
数据p <- structure(list(id = 1:3, treatment = structure(c(1L, 1L, 1L), levels = c("SSTR",
"SSA", "Control"), class = "factor"), age = c(31.3, 36.9, 44.6
), sex = structure(c(1L, 1L, 2L), levels = c("0", "1"), class = "factor"),
progression = c(0L, 1L, 1L), pfs = c(15.6, 8.9, 25.5)), row.names = c(NA,
3L), class = "data.frame")
使用list
列和unnest
:
library(dplyr)
library(purrr)
library(tidyr)
p %>%
mutate(interval = map(pfs %/% 12L + 1L, seq_len)) %>%
unnest(interval) %>%
mutate(start = 12L * (interval - 1L),
stop = pmin(pfs, 12L * interval),
age = age + (interval - 1L)) %>%
group_by(id) %>%
mutate(progression = if_else(interval != max(interval), 0L, progression)) %>%
select(id, treatment, sex, age, progression, start, stop, interval)
# # A tibble: 6 × 8
# # Groups: id [3]
# id treatment sex age progression start stop interval
# <int> <fct> <fct> <dbl> <int> <int> <dbl> <int>
# 1 1 SSTR 0 31.3 0 0 12 1
# 2 1 SSTR 0 32.3 0 12 15.6 2
# 3 2 SSTR 0 36.9 1 0 8.9 1
# 4 3 SSTR 1 44.6 0 0 12 1
# 5 3 SSTR 1 45.6 0 12 24 2
# 6 3 SSTR 1 46.6 1 24 25.5 3
的想法是,你创建一个列表列与间隔计数器,然后unnest
他们(即展开它)。下面是慢动作的解决方案:
p %>%
mutate(interval = map(pfs %/% 12L + 1L, seq_len))
# id treatment age sex progression pfs interval
# 1 1 SSTR 31.3 0 0 15.6 1, 2
# 2 2 SSTR 36.9 0 1 8.9 1
# 3 3 SSTR 44.6 1 1 25.5 1, 2, 3
p %>%
mutate(interval = map(pfs %/% 12L + 1L, seq_len)) %>%
unnest(interval)
# # A tibble: 6 × 7
# id treatment age sex progression pfs interval
# <int> <fct> <dbl> <fct> <int> <dbl> <int>
# 1 1 SSTR 31.3 0 0 15.6 1
# 2 1 SSTR 31.3 0 0 15.6 2
# 3 2 SSTR 36.9 0 1 8.9 1
# 4 3 SSTR 44.6 1 1 25.5 1
# 5 3 SSTR 44.6 1 1 25.5 2
# 6 3 SSTR 44.6 1 1 25.5 3
其余部分相当直接。