我有一个名为all的数据集。Cols2在3年多的时间里,每20分钟对94个地点的水深进行一次测量。这是一个预览:
# A tibble: 89,714 x 95
date_time Levee.slope Levee.slope.1 Levee.slope.2 Levee.slope.3
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-12-01 15:05:33 -0.821 -0.539 -0.325 -0.0991
2 2015-12-01 15:25:33 -0.830 -0.548 -0.334 -0.108
3 2015-12-01 15:45:33 -0.829 -0.547 -0.333 -0.107
4 2015-12-01 16:05:33 -0.833 -0.551 -0.337 -0.111
5 2015-12-01 16:25:33 -0.829 -0.547 -0.333 -0.107
6 2015-12-01 16:45:33 -0.834 -0.552 -0.338 -0.112
7 2015-12-01 17:05:33 -0.839 -0.557 -0.343 -0.117
8 2015-12-01 17:25:33 -0.835 -0.553 -0.339 -0.113
9 2015-12-01 17:45:33 -0.826 -0.544 -0.330 -0.104
10 2015-12-01 18:05:33 -0.804 -0.522 -0.308 -0.0821
# ... with 89,704 more rows, and 90 more variables: Levee.slope.4 <dbl>,
我正在计算每个地点单个洪水事件的度量。
现在我一直在使用下面的for循环一次一个位置计算这些指标,导出结果并复制并粘贴到excel文件中,这需要很长时间。下面是我一直在使用的代码:
for (col in 1:length(list.sites)))
#Label and subset by site
site <- paste0("WaterLevel_",noquote(list.sites[[1]][i]))
mut_sub <- all.cols2 %>% select("Date",all_of(site))
# creates binary for positive/negative water level values
mut_sub$VarA <- as.integer(mut_sub[,2] > 0)
# This code is used to label flood events with unique streak_id
mut_sub <- mut_sub %>% mutate(lagged = lag(VarA))
mut_sub<- mut_sub%>% mutate(start = (VarA != lagged))
mut_sub[1, "start"] <- FALSE
#filter to keep positive water depths (VarA == 1)
mut_sub <- mut_sub %>% mutate(streak_id = cumsum(start)) %>%
filter(VarA == 1)
#calculate mean water depth
ls <- aggregate(mut_sub[,2], by= list(mut_sub$streak_id), FUN = mean, na.rm = TRUE)
names(ls)[2] <- "avg_water_depth"
#calculate max water depth
MAX <- aggregate(mut_sub[,2], by = list(mut_sub$streak_id), FUN = max, na.rm = TRUE)
names(MAX)[2] <- "max_depth"
#getting length (# of observations) of each event
obs <- aggregate(mut_sub[,2], by = list(mut_sub$streak_id), FUN = length)
names(obs)[2] <- "observations"
#calculating number of days per event (duration)
obs <- obs %>%
mutate(duration_days = (((observations-1)*20)/60)/24)
#Time interval:
time <- mut_sub %>% group_by(streak_id) %>% summarise(begin = min(date_time), end = max(date_time))
time <- time %>% rename(Group.1 = streak_id)
#combine data
results1 <- inner_join(ls, MAX)
results2 <- inner_join(results1, obs)
final <- inner_join(results2, time)
#way to label sites
final$site = paste(site, final$Group.1, sep = "_")
}
###...repeat above for each survey point, export and add manually in excel
这将给出如下输出(来自一个站点):
Group.1 avg_water_depth max_depth observations duration_days begin end site
1 0.025245673 0.033995673 4 0.04166667 2016-02-09 2016-02-09 WaterLevel_Levee.slope.1_1
3 0.045995673 0.071995673 8 0.09722222 2016-05-06 2016-05-06 WaterLevel_Levee.slope.1_3
5 0.003995673 0.005995673 2 0.01388889 2016-05-06 2016-05-06 WaterLevel_Levee.slope.1_5
7 0.039370673 0.061995673 8 0.09722222 2016-05-07 2016-05-07 WaterLevel_Levee.slope.1_7
9 0.038785147 0.069995673 19 0.25000000 2016-05-27 2016-05-27 WaterLevel_Levee.slope.1_9
11 0.063817102 0.110995673 28 0.37500000 2016-05-27 2016-05-28 WaterLevel_Levee.slope.1_11
13 0.062817102 0.112995673 28 0.37500000 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_13
15 0.042495673 0.067995673 18 0.23611111 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_15
…其中每个地点的每次洪水事件都有平均水深、最大水深、观测次数、洪水事件的持续时间以及开始和结束的日期/时间。
现在我必须在运行for循环之前指定i
,它不会自动通过我的站点。
我的问题是,是否有一种方法可以让for循环一次遍历所有位置并将其存储在类似于上表的组合输出中?还有,有没有一种方法可以压缩我在循环中的代码,这样我就不必创建那么多数据帧了?
如果没有一些数据,很难展示,但这里是使用foreach
的psuedo代码,如果你想加快速度,你可以使用doParallel
data <- bind_rows(foreach(location = list_locations) %do% {
# code handling data for one location
# ...
# process for each column of one location
one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
# your code handling data
# the final return should be a data_frame even if it is one row data frame
return(one_result_df)
})
# some additiona code if has
# ...
return(one_location_df)
})
注意:如果使用doParallel
,避免将%dopar%
包裹在另一个%dopar%
周围,否则会导致内存泄漏,没有任何工作