r语言 - 尝试使用循环来绘制和可视化存储在列表中的每个数据框



对不起,如果这是一个新手问题和长帖子。 提前谢谢你。所以我有一个 88250 行 131 列的数据集,行是观察值,列是标签和变量(列 1:21 是标签字符,21:131 是变量双精度(。我试图使用UWOT库中的UMAP来可视化,然后进行监督训练。现在我尝试做的第一件事是调整 UMAP 模型的参数,即 n_neighbors 和 min_dist。 UMAP 输出将是 X 和 Y 坐标表,我可以将它们附加到我的数据框上然后绘制它们。 以下是所选一组参数的代码,我可以绘制散点图并将其转换为 2D 密度图,以可视化不同处理的差异,因此facet_wrap。

library(uwot)
#define real data and labels
df.labels = df[,1:21]
df.data = df[,22:131]
#apply UMAP transformation
df.umap<-umap(df.data,n_sgd_threads = 0,n_trees = 500,n_neighbors=50,
min_dist=0.2,pca=50,
verbose = T)
df$UMAPX<- df.umap[,1]
df$UMAPY<- df.umap[,2]
library(ggplot2)
m<-ggplot(df, aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = "UMAP visulisaiton")

#try 2d density plot and see some distribution
m +
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)

现在我想编写循环将所有 umap 结果存储到一个列表中,每个列表都是数据框,其中 UMAP X 和 Y 坐标对应于参数的测试对值。这奏效了,我得到了我的清单。

#attempt to perform grid search for hyperparameter tuning 
#interate the grid, manually set
#performance evaluation
n_neighbors.test <-seq(1,100,20)
min_dist.test <- seq(0.05,4,0.5)
#creating a data frame containing all combinations of the grid
hyper_grid <- expand.grid(n_neighbors=n_neighbors.test, min_dist=min_dist.test)
#create an empty list to store the models
models <- list()
#excute the grid search
for (i in 1:nrow(hyper_grid)) {
# get value paris at row i
n_neighbors <- hyper_grid$n_neighbors[i]
min_dist <- hyper_grid$min_dist[i]
#train a model and store it in the list
models[[i]] <- umap(df.data,n_sgd_threads = 0,n_trees = 500)
}
#integrating the x, y parameters from umap grid search into a list of dataframes for later   visualisation
para<-list()
for (i in 1:40) {
df$UMAPX<- models[[i]][,1]
df$UMAPY<- models[[i]][,2]
para[[i]]<- cbind(df,df$UMAPX,df$UMAPY)
}

这里卡住了 我想使用 x=UMAPX ,y=UMAPY 中的每个数据帧循环这个 ggplot 代码 旨在生成 40 个图,其中包含n_neighbors对的 15 个面板刻面包装,并min_dist测试。我想我可以将之前的 ggplot 部分修改为一个函数,并使用 map 将其应用于列表中的所有内容 para 然后绘制,但绘图列表为 NULL,没有错误返回。而后来的PDF文件是空的/。

library(purrr)
plot<- map(para,function(i){
for (i in 1:40) {
ggplot(para[[i]], aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = "UMAP visulisaiton for model")+
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)
}

})
pdf("plots.pdf")
for (i in 1:length(plot)) { 
print(plot[[i]]) 
} 
dev.off()

原始问题的答案在注释中。将para[[i]]替换为i

要向绘图添加标题:

一种方法是同时映射 para 和hyper_grid的n_neighbors列,并在标题中使用它。如果我正确理解您的代码,以下内容应该有效。 如果 40 是hyper_grid的总 nrow,则可能不需要将 hyper_grid$n_neighbors 与 [1:40] 进行子集化。

plot<- map2(para, hyper_grid$n_neighbors[1:40], function(param, n_neighbors){
ggplot(param, aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = paste("UMAP visualization for model /w n_neighbors: ", n_neighbors))+
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)

})

最新更新