r语言 - 如何根据时间间隔将行分成多行,同时保留一些协变量但改变其他协变量



我正在寻找dplyr的解决方案。

假设我有这个数据帧p

id treatment  age sex progression  pfs
1  1      SSTR 31.3   0           0 15.6
2  2      SSTR 36.9   0           1  8.9
3  3      SSTR 44.6   1           1 25.5

每个患者(id)接受治疗并随访至进展p$progression == 1或无进展p$progression == 0,p$pfs为随访时间(月)。

我想把每一行分成多个区间,每12个月随访一次。

一些协变量不会在每一行中改变,而另一些则会改变。

  • p$id
  • p$treatment
  • p$sex

(和其他在my dataframe)

变化的协变量

  • p$age应每增加一个新的时间间隔增加1(因为每个患者每12个月1岁)
  • p$pfs被分成两个新的协变量:p$startp$stop,表明12个月的间隔
  • 创建一个新的协变量p$interval来表示间隔数(0 - 12个月为interval == 1, 12 - 24个月为interval == 2, 24 - 36个月为interval == 3,等等)

预期输出

id treatment sex  age progression start stop interval
1  1      SSTR   0 31.3           0     0 12.0        1
2  1      SSTR   0 32.3           0    12 15.6        2
3  2      SSTR   0 36.9           1     0  8.9        1
4  3      SSTR   1 44.6           0     0 12.0        1
5  3      SSTR   1 45.6           0    12 24.0        2
6  3      SSTR   1 46.6           1    24 25.5        3

数据
p <- structure(list(id = 1:3, treatment = structure(c(1L, 1L, 1L), levels = c("SSTR", 
   "SSA", "Control"), class = "factor"), age = c(31.3, 36.9, 44.6
   ), sex = structure(c(1L, 1L, 2L), levels = c("0", "1"), class = "factor"), 
progression = c(0L, 1L, 1L), pfs = c(15.6, 8.9, 25.5)), row.names = c(NA, 
             3L), class = "data.frame")

使用list列和unnest:

library(dplyr)
library(purrr)
library(tidyr)
p %>% 
mutate(interval = map(pfs %/% 12L + 1L, seq_len)) %>% 
unnest(interval) %>% 
mutate(start = 12L * (interval - 1L),
stop = pmin(pfs, 12L * interval),
age = age + (interval - 1L)) %>%
group_by(id) %>% 
mutate(progression = if_else(interval != max(interval), 0L, progression)) %>% 
select(id, treatment, sex, age, progression, start, stop, interval)
# # A tibble: 6 × 8
# # Groups:   id [3]
#      id treatment sex     age progression start  stop interval
#   <int> <fct>     <fct> <dbl>       <int> <int> <dbl>    <int>
# 1     1 SSTR      0      31.3           0     0  12          1
# 2     1 SSTR      0      32.3           0    12  15.6        2
# 3     2 SSTR      0      36.9           1     0   8.9        1
# 4     3 SSTR      1      44.6           0     0  12          1
# 5     3 SSTR      1      45.6           0    12  24          2
# 6     3 SSTR      1      46.6           1    24  25.5        3

的想法是,你创建一个列表列与间隔计数器,然后unnest他们(即展开它)。下面是慢动作的解决方案:

p %>% 
mutate(interval = map(pfs %/% 12L + 1L, seq_len)) 
#   id treatment  age sex progression  pfs interval
# 1  1      SSTR 31.3   0           0 15.6     1, 2
# 2  2      SSTR 36.9   0           1  8.9        1
# 3  3      SSTR 44.6   1           1 25.5  1, 2, 3
p %>% 
mutate(interval = map(pfs %/% 12L + 1L, seq_len)) %>% 
unnest(interval)
# # A tibble: 6 × 7
#      id treatment   age sex   progression   pfs interval
#   <int> <fct>     <dbl> <fct>       <int> <dbl>    <int>
# 1     1 SSTR       31.3 0               0  15.6        1
# 2     1 SSTR       31.3 0               0  15.6        2
# 3     2 SSTR       36.9 0               1   8.9        1
# 4     3 SSTR       44.6 1               1  25.5        1
# 5     3 SSTR       44.6 1               1  25.5        2
# 6     3 SSTR       44.6 1               1  25.5        3

其余部分相当直接。

最新更新