将自定义函数应用于多个文件,并在R中创建唯一的csv输出



我是R的初学者,一直在编译一段代码来创建一个自定义函数,以对我拥有的一些数据执行特定任务。自定义函数的结构用于识别csv文件中丢失的数据,并使用平均值对此进行修补。之后,我想按年份和月份汇总数据,并将其导出为csv文件。我有多个csv文件放在一个文件夹中,希望对每个文件执行此任务。到目前为止,我能够获得执行手头任务的代码,但不知道如何为已处理的每个csv文件编写唯一的输出,并将其保存到新文件夹中。我也想在处理后的输出中保留原始文件名_处理过的";此外,任何关于如何改进此代码的建议都是非常受欢迎的。提前谢谢。

# Load all packages required by the script
library(tidyverse) # data science package
library(lubridate) # work with dates
library(dplyr)     # data manipulation (filter, summarize, mutate)
library(ggplot2)   # graphics
library(gridExtra) # tile several plots next to each other
library(scales)
# Set the working directory #
setwd("H:/Shaeden_Post_Doc/Genus_Exchange/GEE_Data/MODIS_Product_Data_Raw/Cold_Temperate_Moist")

#create a function to summarize data by year and month
#patch missing values using the average
summarize_by_month = function(df){

# counting unique, missing and mean values in the ET column
df %>% summarise(n = n_distinct(ET),
na = sum(is.na(ET)),
med = mean(ET, na.rm = TRUE))

# assign mean values to the missing data and modify the dataframe
df = df %>%
mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
df

#separate data into year, month and day  
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month 
df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))
}
#import all files and execute custom function for each
file_list = list.files(pattern="AET", full.names=TRUE)
file_list
my_AET_files = lapply(file_list, read_csv)
monthly_AET = lapply(my_AET_files, summarize_by_month)
monthly_AET 

下面提供了示例数据集的链接https://drive.google.com/drive/folders/1pLHt-vT87lxzW2We-AS1PwVcne3ALP2d?usp=sharing

您可以在相同的函数中读取、操作数据和写入csv:

library(dplyr)
summarize_by_month = function(file) {
df <- readr::read_csv(file)
# assign mean values to the missing data and modify the dataframe
df = df %>% mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
#separate data into year, month and day  
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month 
new_df <- df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))

write.csv(new_df, sprintf('output_folder/%s_processed.csv', 
tools::file_path_sans_ext(basename(file))), row.names = FALSE)
}
monthly_AET = lapply(file_list, summarize_by_month)
path<-"your_peferred_path/" #set a path to were you want to save the files
x<-list.files(pattern= "your_pattern") # create a list of your file names
name<-str_sub(x, start=xL, end=yL) #x & y being the part of the name you want to keep 
for (i in 1:length(monthly_AET)){
write_excel_csv(monthly_AET[i], paste0(path, name, "_processed.csv")) # paste0 allows to create custom names from variables and static strings
}

注意:这只是一个假设,可能需要调整以适应您的需求