在 R-Markdown 中添加"Identify Original .csv"列



我正试图从R-Markdown中创建约117个其他csv文件的组合数据集。我设法使用以下函数将它们组合成一个文档:

setwd()
dataFiles <- lapply(Sys.glob("data*.csv", read.csv)
dataFilesCombined <- data.table::rbindlist(datafiles)

但是,我想在末尾添加一列"dataFilesCombined"其中包含有关数据值来自哪个原始.csv文件的信息。有人能给我一个建议吗?

我试着在网上其他地方寻找答案,但我找不到任何可以很好地与系统一起工作的东西。水珠方法。

我使用5个数据帧创建了一个小表达式,但它应该可以使用117个数据帧。在将列表中的数据框合并为一个大数据框之前,需要为每个数据框分配自己的标识符,该标识符表示它来自哪个.csv文件。要做到这一点,最简单的方法就是使用.csv文件所调用的内容。

library(data.table)
# there's a folder called "reprex" in my Documents that contains 
# five dataframes that look like this
testdata <- structure(list(x1 = 3:5, x2 = c(4L, 2L, 5L), x3 = c(1L, 1L, 2L)), class = "data.frame", row.names = c(NA, -3L))
testdata
#>   x1 x2 x3
#> 1  3  4  1
#> 2  4  2  1
#> 3  5  5  2
# make path
path <- "~/Documents/reprex"
# get names of the dataframes, put into character vector
filelist <- list.files(path = path,
pattern =" *.csv",
full.names = TRUE)
# put all dataframes into a list
lst <- lapply(filelist,
utils::read.csv,
header = TRUE,
stringsAsFactors = FALSE)
# make a name for every dataframe, based on filelist
names(lst) <- filelist
namelist <- fs::path_file(filelist)
namelist <- unlist(lapply(namelist,
sub,
pattern = ".csv",
replacement = ""),
use.names = FALSE)
print(namelist)
#> [1] "data1" "data2" "data3" "data4" "data5"
# give every dataframe in the list an ID variable,
# which is actually the original name of the .csv file
lst <- mapply(cbind, lst, "listID" = namelist, SIMPLIFY = FALSE)
# combine
dataFilesCombined <- data.table::rbindlist(lst)
head(dataFilesCombined)
#>    x1 x2 x3 listID
#> 1:  3  4  1  data1
#> 2:  4  2  1  data1
#> 3:  5  5  2  data1
#> 4:  3  4  1  data2
#> 5:  4  2  1  data2
#> 6:  5  5  2  data2

最新更新