我有一个神经外科病人的数据集,我正在为它创建生存曲线。我试图调整我的曲线,以匹配2000年美国人口的年龄-性别分布,这包括在R生存包中。这个'uspop2'数据集是一个具有和日历年的数组。首先,我只关注50岁及以上的人,所以我将为我们自己的数据创建一个"tab100"表,其中包含观察到的年龄/性别计数,使用相同的年龄上限阈值。新的权值为:𝑝= pi.us/tab100。
这是我写的第一个代码(请注意,我在rpy2中使用R在google collab):
%%R
#Reweighting
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
pi.us <- refpop/sum(refpop)
age100 <- factor(ifelse(mydata$Age..yrs. >100, 100, mydata$Age..yrs.), levels=50:100)
tab100 <- with(mydata, table(age100, mydata$Sex, mydata$group))/ nrow(mydata)
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
range(us.wt)
这产生了从0.006709405到无穷大的范围!之所以会出现这种无限的权重,是因为美国人口具有所有年龄-性别组合,但我的神经外科患者数据集没有。为了摆脱这些无限的重量,我试图将美国人口分解成不同的年龄组……
%%R
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)
print(pi.us)
tab2 <- with(mydata, table(mydata$Age..yrs., mydata$Sex, mydata$group))/nrow(mydata)
print(tab2)
us.wt <- rep(pi.us, 3)/tab2
print(range(us.wt))
index <- with(mydata, cbind(mydata$Age..yrs., mydata$Sex,
as.numeric(mydata$group)))
mydata$uswt <- us.wt[index]
sfit3a <-survfit(Surv(Patient.LOS..days., Events) ~ group, data=mydata, weight=uswt)
印刷π。我们和tab2告诉我,我成功地把年龄分成了8组。然而当我让我们。Wt <- rep(pi;我们,/tab2,我们。Wt仍然和以前完全一样!它不会改变。你可以在下面看到,输出的范围有一个不同的下界,但仍然一直到无穷。毫无疑问,下一行代码会出现下标越界错误。到底发生了什么事?
[1] 0.4655699 Inf
R[write to console]: Error in `[.default`(us.wt, index) : subscript out of bounds
Error in `[.default`(us.wt, index) : subscript out of bounds
顺便说一句,我的代码完全基于这篇R论文的第7页:https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf
我做错了什么?谢谢你的帮助!
这是回答你的问题,但不是解决你的问题。看看index
和us.wt
对象。显然,us.wt
数组的边距命名与index
str(us.wt)
'table' num [1:48, 1:3, 1:3] Inf Inf Inf Inf Inf ...
- attr(*, "dimnames")=List of 3
..$ : chr [1:48] "2" "3" "4" "5" ...
..$ : chr [1:3] "" "F" "M"
..$ : chr [1:3] "Plunge <= 2 mm" "Plunge 2 - 4 mm" "Plunge > 4 mm"
> str(index)
chr [1:240, 1:3] "2" "7" "11" "75" "59" "3" "88" "13" "75" "80" "5" "3" "65" "66" "93" "45" ...
> head(index)
[,1] [,2] [,3]
[1,] "2" "M" "1"
[2,] "7" "M" "3"
[3,] "11" "M" "1"
[4,] "75" "M" "3"
[5,] "59" "M" "1"
[6,] "3" "M" "3"
我也认为我们的数组结构。Wt搞砸了。由于没有描述构建它的逻辑或目标,所以我不想读懂你的想法并提供建议。以下是我认为它出错的原因:
> Hmisc::describe(us.wt)
us.wt
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75
432 0 32 0.691 Inf NaN 4.264 7.864 16.032 Inf Inf
.90 .95
Inf Inf
lowest : 0.5586839 1.1173678 3.1599027 3.4755957 4.2639763
highest: 20.3399270 21.7412450 27.0462314 28.2128223 Inf
Warning message:
In w * sort(x - mean(x)) :
longer object length is not a multiple of shorter object length
# Notice that more than half of the values are Inf
> head(us.wt)
, , = Plunge <= 2 mm
F M
2 Inf 7.053206 7.053206
3 Inf 10.870622 10.870622
4 Inf Inf Inf
5 Inf 15.922230 15.922230
6 Inf Inf Inf
7 Inf 13.581011 13.581011
, , = Plunge 2 - 4 mm
F M
2 Inf 14.106411 14.106411
3 Inf Inf Inf
4 Inf 17.682214 17.682214
5 Inf Inf Inf
6 Inf Inf Inf
7 Inf Inf Inf
, , = Plunge > 4 mm
F M
2 Inf Inf Inf
3 Inf 10.870622 10.870622
4 Inf 17.682214 17.682214
5 Inf Inf Inf
6 Inf 15.348446 15.348446
7 Inf 13.581011 13.581011