我正试图从R-Markdown中创建约117个其他csv文件的组合数据集。我设法使用以下函数将它们组合成一个文档:
setwd()
dataFiles <- lapply(Sys.glob("data*.csv", read.csv)
dataFilesCombined <- data.table::rbindlist(datafiles)
但是,我想在末尾添加一列"dataFilesCombined"其中包含有关数据值来自哪个原始.csv文件的信息。有人能给我一个建议吗?
我试着在网上其他地方寻找答案,但我找不到任何可以很好地与系统一起工作的东西。水珠方法。
我使用5个数据帧创建了一个小表达式,但它应该可以使用117个数据帧。在将列表中的数据框合并为一个大数据框之前,需要为每个数据框分配自己的标识符,该标识符表示它来自哪个.csv
文件。要做到这一点,最简单的方法就是使用.csv
文件所调用的内容。
library(data.table)
# there's a folder called "reprex" in my Documents that contains
# five dataframes that look like this
testdata <- structure(list(x1 = 3:5, x2 = c(4L, 2L, 5L), x3 = c(1L, 1L, 2L)), class = "data.frame", row.names = c(NA, -3L))
testdata
#> x1 x2 x3
#> 1 3 4 1
#> 2 4 2 1
#> 3 5 5 2
# make path
path <- "~/Documents/reprex"
# get names of the dataframes, put into character vector
filelist <- list.files(path = path,
pattern =" *.csv",
full.names = TRUE)
# put all dataframes into a list
lst <- lapply(filelist,
utils::read.csv,
header = TRUE,
stringsAsFactors = FALSE)
# make a name for every dataframe, based on filelist
names(lst) <- filelist
namelist <- fs::path_file(filelist)
namelist <- unlist(lapply(namelist,
sub,
pattern = ".csv",
replacement = ""),
use.names = FALSE)
print(namelist)
#> [1] "data1" "data2" "data3" "data4" "data5"
# give every dataframe in the list an ID variable,
# which is actually the original name of the .csv file
lst <- mapply(cbind, lst, "listID" = namelist, SIMPLIFY = FALSE)
# combine
dataFilesCombined <- data.table::rbindlist(lst)
head(dataFilesCombined)
#> x1 x2 x3 listID
#> 1: 3 4 1 data1
#> 2: 4 2 1 data1
#> 3: 5 5 2 data1
#> 4: 3 4 1 data2
#> 5: 4 2 1 data2
#> 6: 5 5 2 data2