从读取数据集.循环中的 Rdata 文件

让我们想象一下我们有这种情况：

我有很多.RData 文件，超过 100mb（无论如何，但很大）。
在每一个.RData文件有一个名为"Dataset_of_interest"的数据集，它们都是我想创建的大数据集的一部分。

所以我想知道是否可以只加载我感兴趣的数据集到内存中，而不是加载整个.RData 文件？

我想循环加载每个"Dataset_of_interest"，合并到一个大文件中，然后将其保存在一个文件中。

编辑：我在Windows 7上工作。

我认为

这是可能的，但需要一些并行处理能力。每个工作线程将加载 .RData 文件并输出所需的对象。合并结果可能非常简单。

我无法为您的数据提供代码，因为我不知道结构，但我会按照下面的 chunk'o'code 做一些事情。请注意，我在 Windows 上，您的工作流程可能会有所不同。您不应该缺少计算机内存。此外，降雪并不是使用多个内核的唯一接口。

# load library snowfall and set up working directory
# to where the RData files are
library(snowfall)
working.dir <- "/path/to/dir/with/files"
setwd(working.dir)
# initiate (redneck jargon: and then she ate) workers and export
# working directory. Working directory could be hard coded into
# the function, rendering this step moot
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
sfExport(list = c("working.dir")) # you need to export all variables but x
# read filenames and step through each, returning only the
# desired object
lofs <- list.files(pattern = ".RData")
inres <- sfSapply(x = lofs, fun = function(x, wd = working.dir) {
    setwd(wd)
    load(x)
    return(Dataset_of_interest)
  }, simplify = FALSE)
sfStop()
# you could post-process the data by rbinding, cbinding, cing...
result <- do.call("rbind", inres)

相关内容

最新更新

热门标签：

从 读取数据集.循环中的 Rdata 文件

相关内容

最新更新

热门标签：

从读取数据集.循环中的 Rdata 文件