如何将包含 30 多个压缩文件的文件夹存储到 r 中的变量中



我使用软件包"GDELTtools"从GDELT下载数据。现在,数据已下载,但是,全局环境中没有存储任何变量。我想将数据存储到数据帧变量中,以便对其进行分析。

该文件夹包含 30 多个压缩文件。每个压缩文件都包含一个 csv。我需要将所有这些 csv 存储在 r 全局环境中的一个变量中。我希望这能做到。

提前谢谢你!

一段时间没有写 R,所以我会尽力而为。

仔细阅读评论,因为他们会解释程序

我将附加链接以检查以下信息:解压缩,读取CSV,合并数据帧,空数据帧,连接字符串

根据GDELTtools的文档,您可以通过提供local.folder="~/gdeltdata">作为GetGDELT((函数的参数来轻松指定下载文件夹。

之后,您可以 list.files("path/to/files/directory"( 函数来获取下面解释代码中使用的文件名向量。查看文档以获取更多示例和说明。

# set path to of unzip output
outDir <-"C:\Users\Name\Documents\unzipfolder"
# relative path where zip files are stored
relativePath <- "C:\path\to\my\directory\"
# create varible to store all the paths to the zip files in a vector
zipPaths <- vector()
# since we have 30 files we should iterate through
# I assume you have a vector with file names in the variable fileNames
for (name in fileNamesZip) {
  # Not sure if it will work but use paste() to concat strings
  zipfilepath <- paste0(relativePath, name, ".zip")
  # append filepath
  append(zipPaths, zipfilepath)
}
# now we have a vector which contains all the paths to zip files
# use unzip() function and pass zipPaths to it. (Read official docs)
unzip(files=zipPaths, exdir=outDir)
# initialize dataframe for all the data. You must provide datatypes for the columns.
total <- data.frame=(Doubles=double(),
             Ints=integer(),
             Factors=factor(),
             Logicals=logical(),
             Characters=character(),
             stringsAsFactors=FALSE)
# now its time to store data by reading csv files and storing them into dataframe.
# again, I assume you have a vector with file names in the variable fileNames
for (name in fileNamesCSV) {
  # create the csv file path 
  csvfilepath <- paste0(outDir, name, ".csv")
  # read data from csv file and store in in a dataframe
  dataFrame = read.csv(file=csvfilepath, header=TRUE, sep=",")
  # you will be able to merge dataframes only if they are equal in structure. Specify the column names to merge by.
  total <- merge(data total, data dataFrame, by=c("Name1","Name2"))
}

可能更简单的东西:

  1. list.files()列出目录中的文件
  2. readr::read_csv()将根据需要自动解压缩文件
  3. dplyr::bind_rows()将合并数据框

所以试试:

lf <- list.files(pattern="\.zip")
dfs <- lapply(lf,readr::read_csv)
result <- dplyr::bind_rows(dfs)

最新更新