是否有一个R函数允许我根据相同的字符将列表分组在一起,但如果有子目录,它会将它们分开



我的目标是用所有按日期分隔的单独文件创建一个div。我在这个问题的答案中使用的方法对我有效,然而,我有一个问题,即有一些日期相同的文件也出现了。它们位于称为"的单独的子目录中;FailedToProcess";并且显示在相同的div中,并且也显示了这些链接。是否可以区分这两者;FailedToProcess";子目录并没有显示,而是显示在另一个单独的div中?谢谢

这是我的代码:

# create a vector of unique date ranges
(date_range_unique_vec <- str_sub(fname, start = 7, end = 23) %>% 
unique())
for (each_date_range in date_range_unique_vec) {

# extract group of file names for each unique date range
group_fnames <- files[str_detect(files, each_date_range)]

{
html_block <- make_div(group_fnames, each_date_range)
top <- readLines("header.html")
bottom <- readLines("footer.html")

# This will write just the div block
write(x = html_block, file = paste0(each_date_range, "-block.html"))

# This will write a working website
write(x = c(top, "<body>", html_block, "</body>", bottom),
file = paste0(each_date_range, "-website.html"))

}  

cat(each_date_range, "n")
cat(group_fnames, "n")
cat("n")
}

编辑:

files <- list.files(recursive = TRUE)
file_name <- strsplit(files, "/")
# extract the file names themselves
fname <- unlist(lapply(file_name, FUN = function(x) { 
if(length(x) == 2) { x[2] } else { x[3] } }))

实现目标的一种方法如下。

与之前的主要区别在于,对循环中的html编写函数有两个调用,首先是已处理的文件,然后是未处理的文件。length()>0条件只是为了确保只有在每个日期范围确实存在已处理或未处理的文件时才尝试写入html文件。

library(dplyr)
library(stringr)
# representative sample of files
files <- c("1858/FailedToProcess/TOR-D-18580907-18580908.tif", 
"1858/FailedToProcess/TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS",
"1858/FailedToProcess/TOR-D-18580907-18580908.tif-FailToProcess-Plot.png",
"1858/FailedToProcess/TOR-D-18580907-18580908.tif.png",
"1858/FailedToProcess/TOR-D-18580908-18580909.tif",                  
"1858/FailedToProcess/TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS",
"1858/FailedToProcess/TOR-D-18580908-18580909.tif-FailToProcess-Plot.png",
"1858/FailedToProcess/TOR-D-18580908-18580909.tif.png",
"1858/FailedToProcess/TOR-D-18580910-18580911.tif",                       
"1858/FailedToProcess/TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS",
"1858/TOR-D-18580910-18580911.tif",                       
"1858/TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS",
"1939/AGC-D-19390310-19390312.tif", 
"1939/AGC-D-19390310-19390312.tif.png",
"1939/AGC-D-19390310-19390312.tif-FailToProcess-Data.RDS",
"1940/A06-D-19400306-19400306.tif", 
"1940/A06-D-19400306-19400306.tif.png",
"1940/A06-D-19400306-19400306.tif-FailToProcess-Data.RDS",
"1941/A02-D-19410302-19410302.tif", 
"1941/A02-D-19410302-19410302.tif.png",
"1941/A02-D-19410302-19410302.tif-FailToProcess-Data.RDS")
# you can get the file name (without full path) with basename()
fname <- basename(files)
# create a vector of unique date ranges from processed files
(date_ranges_vec <- str_sub(basename(files), start = 7, end = 23) %>% 
unique())
# since it does not change for all datse, can go outside of loop for speed
top <- readLines("header.html")
bottom <- readLines("footer.html")
for (each_date_range in date_ranges_vec) {

# extract group of file names for each unique date range. Processed first
group_fnames <- files[str_detect(files, each_date_range) & !str_detect(files, "/FailedToProcess/")]

# check that there is at least one file that respects above conditions
if (length(group_fnames)>0) {
cat(each_date_range, ": Writing processed.", "n")
html_block <- make_div(group_fnames, each_date_range)
write(x = html_block, file = paste0(each_date_range, "-block.html"))
write(x = c(top, "<body>", html_block, "</body>", bottom),
file = paste0(each_date_range, "-website.html"))
}

# extract group of file names for each unique date range. Not procesed
group_fnames_fail <- files[str_detect(files, each_date_range) & str_detect(files, "/FailedToProcess/")]

# check that there is at least one file that respects above conditions
if (length(group_fnames_fail)>0) {
cat(each_date_range, ": Writing not processed.", "n")
html_block <- make_div(group_fnames_fail, each_date_range)
write(x = html_block, file = paste0(each_date_range, "-block.html"))
write(x = c(top, "<body>", html_block, "</body>", bottom),
file = paste0(each_date_range, "-website.html"))
}
}
18580907-18580908 : Writing not processed. 
18580908-18580909 : Writing not processed. 
18580910-18580911 : Writing processed. 
18580910-18580911 : Writing not processed. 
19390310-19390312 : Writing processed. 
19400306-19400306 : Writing processed. 
19410302-19410302 : Writing processed. 

在此设置中,div将按日期排序。如果你想先处理所有处理过的文件,然后再处理所有未处理的文件,你可以通过运行到循环来实现,一个只在处理过的档案上,另一个在未处理的档案上。

相关内容

最新更新