R:如何在每次丢弃一个观测值的模型中循环?



我在每次放弃一个观测值的回归模型中循环来估计有影响的观测值的效果时遇到了麻烦。

我想多次运行模型,每次丢弃第I个观测值并提取相关系数估计并将其存储在向量中。我认为这可以很容易地完成一个相当直接的循环,但是,我在细节上卡住了。

我想要得到一个包含n个系数估计的向量来自同一模型的n次迭代。任何帮助都是有益的!

下面我提供了一些虚拟数据和示例代码。

#Dummy data:
set.seed(489)
patientn <- rep(1:400)
gender <- rbinom(400, 1, 0.5)
productid <- rep(c("Product A","Product B"), times=200)
country <- rep(c("USA","UK","Canada","Mexico"), each=50)
baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2
sfactor  <- c(
rep(c(0.3,0.9), times = 25),
rep(c(0.4,0.5), times = 25),
rep(c(0.2,0.4), times = 25),
rep(c(0.3,0.7), times = 25)
)
rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)
Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```

## Example of how this can be done manually: 
# model
m1<-lm(rashleft ~ gender + baselarea + sfactor, data = data)
# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")
betas <- c(lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])
# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.  

可以使用for循环。在你的问题中,你使用了一个没有定义的对象rashdummy2b。现在我用的是data,但是你可以用一个选择的对象来代替它。

#create list to bind results to
result <- list()
#loop through patients and extract betas
for(i in unique(data$patientn)){
#construct linear model
lm.model <- lm(rashleft ~ gender + baselarea + sfactor, data = subset(data, data$patientn != i))

#create data.frame containing patient left out and coefficient
result.dt <- data.frame(beta = lm.model$coefficients[[2]],
patient_left_out = i)

#bind to list
result[[i]] <- result.dt
}
#bind to data.frame
result <- do.call(rbind, result)

结果

head(result) 
beta patient_left_out
1 1.381248                1
2 1.345188                2
3 1.427784                3
4 1.361674                4
5 1.420417                5
6 1.454196                6

可以通过使用负索引来删除特定的行(或列)。在您的情况下,您可以按照以下步骤进行:

betas <- numeric(nrow(rashdummy2b))  # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
betas[i] <- lm(rashleft ~ gender + baselarea + sfactor, data=rashdummy2b[-i,])$coefficients[2]
}

相关内容

最新更新