当使用启动/停止时间而不是后续时间时，使用pyears的r-tcut行为

我试图使用pyears来估计队列中的发病率，其中我感兴趣的协变量之一是事件发生时的年龄(而不是登记时的年龄，即登记队列(。事件发生时的年龄当然是与时间相关的。正确的方法似乎是在注册时使用tcut，如pyears帮助中所示。但是，它似乎只有在开始时间始终为零时才起作用(或者您使用的等效方法是为Surv对象提供后续时间，而不是开始/结束时间(。对于我的场景，使用实际的开始/结束时间很重要，因为我还想使用其他时变协变量，如日历年。

这里有一个例子来说明这个问题：

library(tidyverse)
library(survival)
# encode actual start/end dates
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# encode time elapsed from origin of zero
s2 <- tibble(stime = 0,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# these ought to give the same results, but don't (the second one appears to be right)
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
# test it with a dataset where start time is always zero - works
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears

这导致：

> # these ought to give the same results, but don't (the second one appears to be right)
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
0.00        0.00        0.00        0.00      365.25 
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
0.00      365.25      730.50     1461.00      730.50 
> 
> # test it with a dataset where start time is always zero - works
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
730.50     1095.75     1095.75      730.50        0.00 
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
730.50     1095.75     1095.75      730.50        0.00

第一个例子在提供开始/结束时间时失败，但在提供经过时间时有效，而第二个例子在开始/结束或经过时间下都有效(因为开始时间被人为设置为零(。

我意识到这是这种情况下的一个解决方案，但pyears+tcut不应该无论间隔如何编码都表现相同吗？我是不是误解了tcut应该做什么？

谢谢，Peter

我正确统计年龄的目标需要在间隔开始时指定年龄，而不是(之前注册的(日期的年龄，如下所示：

# another example, using DOB which is truly constant
set.seed(1234)
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
etime = stime + 3652.50,
outcome = c(1,1,1,0,0,0,0,0,0,0),
dob = round(runif(10, as.Date("1930-01-01"), 
as.Date("1985-01-01"))),
age.enr = floor((stime - dob)/365.25),
age.end = floor((etime - dob)/365.25),
sobj = Surv(etime - stime, outcome)) # just for convenience
summary(s1)
s1 %>% mutate_at(vars(stime, etime, dob), ~as.Date(.x, origin="1970-01-01"))
s1$enrd <- s1$stime - 365.25*3               # simulate an erolment date 3 years prior to this interval
s1$age.int <- s1$age.enr                     # actually, this is the age at beginning of interval, not enrolment
s1$age.enr <- floor((s1$enrd - s1$dob)/365.25)
pyears(sobj ~ tcut(age.enr, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # incorrect
pyears(sobj ~ tcut(age.int, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # correct

削减"age.int"似乎可以得到想要的行为。我还(我认为(采纳了@AllanMeron的建议，只将对象存储在data.frame.中

相关内容

最新更新

热门标签：