用r中的for循环构建洪水度量计算数据框架



我有一个名为all的数据集。Cols2在3年多的时间里,每20分钟对94个地点的水深进行一次测量。这是一个预览:

# A tibble: 89,714 x 95
date_time           Levee.slope      Levee.slope.1      Levee.slope.2    Levee.slope.3
<dttm>                         <dbl>            <dbl>            <dbl>            <dbl>
1 2015-12-01 15:05:33           -0.821           -0.539           -0.325          -0.0991
2 2015-12-01 15:25:33           -0.830           -0.548           -0.334          -0.108 
3 2015-12-01 15:45:33           -0.829           -0.547           -0.333          -0.107 
4 2015-12-01 16:05:33           -0.833           -0.551           -0.337          -0.111 
5 2015-12-01 16:25:33           -0.829           -0.547           -0.333          -0.107 
6 2015-12-01 16:45:33           -0.834           -0.552           -0.338          -0.112 
7 2015-12-01 17:05:33           -0.839           -0.557           -0.343          -0.117 
8 2015-12-01 17:25:33           -0.835           -0.553           -0.339          -0.113 
9 2015-12-01 17:45:33           -0.826           -0.544           -0.330          -0.104 
10 2015-12-01 18:05:33           -0.804           -0.522           -0.308          -0.0821
# ... with 89,704 more rows, and 90 more variables: Levee.slope.4 <dbl>,

我正在计算每个地点单个洪水事件的度量。

现在我一直在使用下面的for循环一次一个位置计算这些指标,导出结果并复制并粘贴到excel文件中,这需要很长时间。下面是我一直在使用的代码:

for (col in 1:length(list.sites)))
#Label and subset by site  
site <-  paste0("WaterLevel_",noquote(list.sites[[1]][i])) 
mut_sub <- all.cols2 %>% select("Date",all_of(site))

# creates binary for positive/negative water level values 
mut_sub$VarA <- as.integer(mut_sub[,2] > 0) 

# This code is used to label flood events with unique streak_id
mut_sub <- mut_sub %>% mutate(lagged = lag(VarA))
mut_sub<-  mut_sub%>% mutate(start = (VarA != lagged)) 
mut_sub[1, "start"] <- FALSE 
#filter to keep positive water depths (VarA == 1)
mut_sub <- mut_sub %>% mutate(streak_id = cumsum(start)) %>%
filter(VarA == 1)

#calculate mean water depth
ls <- aggregate(mut_sub[,2], by= list(mut_sub$streak_id), FUN = mean, na.rm = TRUE) 

names(ls)[2] <- "avg_water_depth" 

#calculate max water depth
MAX <- aggregate(mut_sub[,2], by = list(mut_sub$streak_id), FUN = max, na.rm = TRUE)

names(MAX)[2] <- "max_depth"

#getting length (# of observations) of each event
obs <- aggregate(mut_sub[,2], by = list(mut_sub$streak_id), FUN = length)

names(obs)[2] <- "observations"

#calculating number of days per event (duration)
obs <- obs %>%
mutate(duration_days = (((observations-1)*20)/60)/24)

#Time interval: 
time <- mut_sub %>% group_by(streak_id) %>% summarise(begin = min(date_time), end = max(date_time))
time <- time %>% rename(Group.1 = streak_id)

#combine data
results1 <- inner_join(ls, MAX)
results2 <- inner_join(results1, obs)
final <- inner_join(results2, time)
#way to label sites
final$site = paste(site, final$Group.1, sep = "_")
}
###...repeat above for each survey point, export and add manually in excel 

这将给出如下输出(来自一个站点):

Group.1 avg_water_depth   max_depth observations duration_days      begin        end                        site
1     0.025245673 0.033995673            4    0.04166667 2016-02-09 2016-02-09  WaterLevel_Levee.slope.1_1
3     0.045995673 0.071995673            8    0.09722222 2016-05-06 2016-05-06  WaterLevel_Levee.slope.1_3
5     0.003995673 0.005995673            2    0.01388889 2016-05-06 2016-05-06  WaterLevel_Levee.slope.1_5
7     0.039370673 0.061995673            8    0.09722222 2016-05-07 2016-05-07  WaterLevel_Levee.slope.1_7
9     0.038785147 0.069995673           19    0.25000000 2016-05-27 2016-05-27  WaterLevel_Levee.slope.1_9
11     0.063817102 0.110995673           28    0.37500000 2016-05-27 2016-05-28 WaterLevel_Levee.slope.1_11
13     0.062817102 0.112995673           28    0.37500000 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_13
15     0.042495673 0.067995673           18    0.23611111 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_15

…其中每个地点的每次洪水事件都有平均水深、最大水深、观测次数、洪水事件的持续时间以及开始和结束的日期/时间。

现在我必须在运行for循环之前指定i,它不会自动通过我的站点。

我的问题是,是否有一种方法可以让for循环一次遍历所有位置并将其存储在类似于上表的组合输出中?还有,有没有一种方法可以压缩我在循环中的代码,这样我就不必创建那么多数据帧了?

如果没有一些数据,很难展示,但这里是使用foreach的psuedo代码,如果你想加快速度,你可以使用doParallel

data <- bind_rows(foreach(location = list_locations) %do% {
# code handling data for one location
# ...

# process for each column of one location
one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
# your code handling data

# the final return should be a data_frame even if it is one row data frame
return(one_result_df)
})

# some additiona code if has
# ...
return(one_location_df)
})

注意:如果使用doParallel,避免将%dopar%包裹在另一个%dopar%周围,否则会导致内存泄漏,没有任何工作

相关内容

  • 没有找到相关文章

最新更新