R For循环随机森林编译



在for循环中,我想将我的数据集子集分为组ID,为每个组(500个唯一的组ID)运行一个随机森林,为每个组获取解释的% var,并为每个组ID和相关的% var创建一个最后的数据帧。下面是我的代码。我知道我很接近了,因为随机森林有效。我想不出最后几个步骤。通知你"results"是仅包含组ID和实际响应的数据帧;数据框1为所有解释变量和所有组的响应。

### subset datasets for each species
dat_i <- subset(dataframe1, dataframe1$Group==unique(dataframe1$Group)[i])
# run a random forest on each species subset
rf.i <- randomForest(Response~., data=dat_i, proximity=TRUE)
## generated predicted values to compare for accuracy assessments
dat_i$predicted <- unname(predict(rf.i, dat_i))
dat_i$var <- 1 - (sum((dat_i$Response-dat_i$predicted)^2)/sum((dat_i$Response-mean(dat_i$Response))^2))
# add in species variables
combined_df <- merge(results, dat_i, by="Group")
}```

我会这样做(我使用内置的iris数据集,因为您没有提供任何可重复的示例):

library(randomForest)
data(iris)
#assign group number according to Species
iris$Group <- as.integer(iris$Species)
dataframe1 <- iris
#initalize a list for storing results
var <- as.list(seq_along(unique(iris$Group)))
for (i in seq_along(unique(iris$Group))) {
#subset data according to groups
dat_i <- dataframe1[dataframe1$Group==i,]
# run a random forest on each species subset
rf.i <- randomForest(Sepal.Length~., data=dat_i, proximity=TRUE)
## generated predicted values to compare for accuracy assessments
dat_i$predicted <- unname(predict(rf.i, dat_i))
#store results in the list
var[[i]] <- 1 - (sum((dat_i$Sepal.Length-dat_i$predicted)^2)/sum((dat_i$Sepal.Length-mean(dat_i$Sepal.Length))^2))
}
#convert to data.frame
result <- as.data.frame(do.call(rbind,var))
#add Groups column
result$Group <- seq_along(unique(iris$Group))

输出:

V1 Group
1 0.3568714     1
2 0.4507938     2
3 0.4178580     3

最新更新