我试图使用pyears来估计队列中的发病率,其中我感兴趣的协变量之一是事件发生时的年龄(而不是登记时的年龄,即登记队列(。事件发生时的年龄当然是与时间相关的。正确的方法似乎是在注册时使用tcut,如pyears帮助中所示。但是,它似乎只有在开始时间始终为零时才起作用(或者您使用的等效方法是为Surv对象提供后续时间,而不是开始/结束时间(。对于我的场景,使用实际的开始/结束时间很重要,因为我还想使用其他时变协变量,如日历年。
这里有一个例子来说明这个问题:
library(tidyverse)
library(survival)
# encode actual start/end dates
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# encode time elapsed from origin of zero
s2 <- tibble(stime = 0,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# these ought to give the same results, but don't (the second one appears to be right)
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
# test it with a dataset where start time is always zero - works
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
这导致:
> # these ought to give the same results, but don't (the second one appears to be right)
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
0.00 0.00 0.00 0.00 365.25
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
0.00 365.25 730.50 1461.00 730.50
>
> # test it with a dataset where start time is always zero - works
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
730.50 1095.75 1095.75 730.50 0.00
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
730.50 1095.75 1095.75 730.50 0.00
第一个例子在提供开始/结束时间时失败,但在提供经过时间时有效,而第二个例子在开始/结束或经过时间下都有效(因为开始时间被人为设置为零(。
我意识到这是这种情况下的一个解决方案,但pyears+tcut不应该无论间隔如何编码都表现相同吗?我是不是误解了tcut应该做什么?
谢谢,Peter
我正确统计年龄的目标需要在间隔开始时指定年龄,而不是(之前注册的(日期的年龄,如下所示:
# another example, using DOB which is truly constant
set.seed(1234)
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
etime = stime + 3652.50,
outcome = c(1,1,1,0,0,0,0,0,0,0),
dob = round(runif(10, as.Date("1930-01-01"),
as.Date("1985-01-01"))),
age.enr = floor((stime - dob)/365.25),
age.end = floor((etime - dob)/365.25),
sobj = Surv(etime - stime, outcome)) # just for convenience
summary(s1)
s1 %>% mutate_at(vars(stime, etime, dob), ~as.Date(.x, origin="1970-01-01"))
s1$enrd <- s1$stime - 365.25*3 # simulate an erolment date 3 years prior to this interval
s1$age.int <- s1$age.enr # actually, this is the age at beginning of interval, not enrolment
s1$age.enr <- floor((s1$enrd - s1$dob)/365.25)
pyears(sobj ~ tcut(age.enr, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # incorrect
pyears(sobj ~ tcut(age.int, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # correct
削减"age.int"似乎可以得到想要的行为。我还(我认为(采纳了@AllanMeron的建议,只将对象存储在data.frame.中