我每周都会收到相同的数据,这些数据在excel选项卡上具有相同的名称。我需要首先将所有具有相同名称的表分组,并在将它们组合在一起之前对它们进行操作。例如:
Week1.xlsx (workbook)
- Blue (sheet) - has a column named type that just says "Color"
- Red (sheet) - has a column named type that just says "Color"
- Green (sheet)` - has a column named type that just says "Color"
Week2.xlsx (workbook)
- Blue (sheet) - has a column named type that just says "Color"
- Red (sheet) - has a column named type that just says "Color"
- Green (sheet) - has a column named type that just says "Color"
等。
我需要将所有常见的表组合在一起,并修改type
列,以说明它来自哪个颜色表。如果我将它们组合在一起,那么type
列将只使用& Color"填充,因此工作表的名称是告诉我如何组织它的唯一信息。
最终输出:all-data.xlsx
Type
Color - Blue
Color - Red
Color - Blue
Color - Green
.
.
.
我不一定需要第二部分(修改列)的帮助,我只是想知道如何将所有常见的工作表组合在一起(可能在一个列表中),我可以通过名称访问。
EDIT我不知道如何把它标记为副本,但我只是提交给审查。更多答案请参阅下面的帖子。
好吧,我从这篇文章中找到了一个答案,完美地总结了这个问题。具体来说,.id = 'sheet_name'
部分通过将每一行分配给工作表的名称来实现我的目标。
library(tidyverse)
library(readxl)
dir_path <- "~/test_dir/" # target directory path where the xlsx files are located.
re_file <- "^test[0-9]\.xlsx" # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc, but could simply be 'xlsx'.
read_sheets <- function(dir_path, file){
xlsx_file <- paste0(dir_path, file)
xlsx_file %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>%
mutate(file_name = file) %>%
select(file_name, sheet_name, everything())
}
df <- list.files(dir_path, re_file) %>%
map_df(~ read_sheets(dir_path, .))