R将excel与不兼容的列相结合



我将多个excel文件与一对多工作表组合在一起。它们各自有不同的列。我只想把工作表和地址信息结合起来。如果工作表没有地址信息,我需要在生成的组合文件中记下它。。如果我有问题,其中一张工作表有蔬菜,没有地址,另一张有地址信息。。我正在使用下面的代码将它们组合在一起。在我把它投入使用后,我会把它们清理、标准化并放在一起。

dir_path <- "C:/temp/ConsigneeList/stuff4/"         # target directory where the xlsx files are located. 
re_file <- list.files(dir_path, pattern=".xls*")    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc.
read_sheets <- function(dir_path, file){
xls_file <- paste0(dir_path, file)
xls_file %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel, path = xls_file, .id = 'sheet_name') %>% 
mutate(file_name = file) %>% 
select(file_name, sheet_name, everything())
}
number_of_excel_files<-length(file.list)
mybiggerlist<-vector('list',number_of_excel_files)
for(file in 1:length(mybiggerlist)) {
mybiggerlist[[file]]<-read_sheets(dir_path, file.list[file])

}

我收到错误:错误:无法组合Customer Quick REF$Order NoCH Belt$Order No。我尝试使用%>%mutate_all(as.character(作为列在本质上都应该是字符。。关于我该如何解决这个问题,有什么想法吗?或者,是否有一种方法可以跳过导入有问题的数据,并创建一行指示该工作表存在问题?非常感谢。

试试这样的东西:

dir_path <- "C:/temp/ConsigneeList/stuff4/"         # target directory where the xlsx files are located. 
re_file <- list.files(dir_path, pattern=".xls*")    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc.
read_sheets <- function(dir_path, file){
xls_file <- paste0(dir_path, file)
sheets <- xls_file %>%
excel_sheets() %>%
set_names() %>% ## not really sure if this is doing anything?
map(read_excel, path = xls_file)

# Now we have all the sheets in a list. 
# Time to figure out which ones to combine
# Use purrr::keep to only keep sheets that meet some condition
# I just put in a wild guess, edit the test so that only sheets
# you want are kept
sheets <- purrr::keep(sheets, ~ "Address" %in% names(.))    

bind_rows(sheets, .id = 'sheet_name') %>%
mutate(file_name = file) %>% 
select(file_name, sheet_name, everything())
}

最新更新