R 有效地bind_rows存储在硬盘驱动器上的许多数据帧



我有大约 50000 个.rda文件。每个都包含一个名为results的数据帧,其中只有一行。我想将它们全部附加到一个数据帧中。

我尝试了以下方法,它有效,但速度很慢:

root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")
load(files[1])
results_table = results
rm(results)
for(i in c(2:length(files))) {
print(paste("We are at step ", i,sep=""))
load(files[i])
results_table= bind_rows(list(results_table, results))
rm(results)
}

有没有更有效的方法可以做到这一点?

使用.rds稍微容易一些。但是,如果我们仅限于.rda以下内容可能会有用。我不确定这是否比你所做的更快:

library(purrr)
library(dplyr)
library(tidyr)
## make and write some sample data to .rda
x <- 1:10
fake_files <- function(x){
df <- tibble(x = x)
save(df, file = here::here(paste0(as.character(x),
".rda")))
return(NULL)
}
purrr::map(x,
~fake_files(x = .x))
## map and load the .rda files into a single tibble
load_rda <- function(file) {
foo <- load(file = file) # foo just provides the name of the objects loaded
return(df) # note df is the name of the rda returned object
}
rda_files <- tibble(files = list.files(path = here::here(""),
pattern = "*.rda",
full.names = TRUE)) %>%
mutate(data = pmap(., ~load_rda(file = .x))) %>%
unnest(data)

这是未经测试的代码,但应该非常有效:

root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")
data_list <- lapply("mydata.rda", function(f) {
message("loading file: ", f)
name <- load(f)                    # this should capture the name of the loaded object
return(eval(parse(text = name)))   # returns the object with the name saved in `name`
})
results_table <- data.table::rbindlist(data_list)

data.table::rbindlistdplyr::bind_rows非常相似,但速度更快一些。

最新更新