r-时间序列和dplyr,事件检测(日期更改问题)



我希望有人能帮助我解决以下事件检测问题。输入数据是一个时间序列(常规(。它包含"时间"、"水位"one_answers"径流"。其目的是检测阈值以上的事件,并提取事件开始的时间、结束的时间、持续时间(分钟(以及事件期间的最大值/和值。根据定义,当日期发生变化时,必须剪切每个事件。相反,只有当缺失值的持续时间超过一小时时,NA才应该导致事件的切割。

library(data.table)
library(dplyr)
library(xts)
## data
dWL <- structure(list(Time = structure(c(1463951500, 1463951800, 1463952100, 1463952400, 1463952700, 1463953000, 1463953300, 1463953600, 1463953900, 1463954200, 1463954500, 1463954800, 1463955100, 1463955400, 1463955700, 1463956000),class = c("POSIXct", "POSIXt"), tzone = ""), WL = c(0.2, 2.5, 2.4, 2.1, 0.9, 2.8, 2.9, 1.9, 2.4, NA, 2.3, 2.6, 2.8, 2.1, 2.0, 1.9), Q = c(0.0, 255.5, 232.4, 150.1, 0.0, 345.8, 382.9, 0.0, 214.4, NA, 201.3, 312.6, 362.8, 80.1, 20.0, 0.0)), row.names = c(NA, -16L), class = "data.frame")
## threshold value
vth <-2

na.omit(dWL) %>%  ## ??how to drop NAs only when the NA-duration is longer than an hour??
mutate(tmp_WL = WL >= vth, id = rleid(tmp_WL)) %>%
filter(tmp_WL) %>%
group_by(id) %>% ## ??how to additional seperate events during change-of-date??
summarise(start_time=first(Time),end_time=last(Time), event_duration = difftime(last(Time), first(Time)), max_Q=max(Q), sum_Q=sum(Q))

我知道包heatwaveR具有非常有用的exceedance功能,尽管我还没有设法使它适用于亚日常时间序列。

既然您用data.table标记了它,让我们使用它。我们可以使用rleid()的游程编码来跟踪事件。一旦我们有了每个人的ID,我们就可以通过进行简单的分组并进行计算。最后,我们只是通过将RLE列设置为NULL来删除它,并使用[]来查看结果。

library(data.table)
setDT(dWL)[!is.na(WL),event := WL > vth][
,RLE := rleidv(event)][
event == TRUE,.(start = min(Time),
end=max(Time),
max.WL=max(WL),
duration=difftime(max(Time),min(Time)),
runoff=sum(Q)),
by=RLE][,RLE:=NULL][]
#                 start                 end max.WL duration runoff
#1: 2016-05-22 17:16:40 2016-05-22 17:26:40    2.5  10 mins  638.0
#2: 2016-05-22 17:36:40 2016-05-22 17:41:40    2.9   5 mins  728.7
#3: 2016-05-22 17:51:40 2016-05-22 17:51:40    2.4   0 mins  214.4
#4: 2016-05-22 18:01:40 2016-05-22 18:16:40    2.8  15 mins  956.8

数据

dWL <- structure(list(Time = structure(c(1463951500, 1463951800, 1463952100, 1463952400, 1463952700, 1463953000, 1463953300, 1463953600, 1463953900, 1463954200, 1463954500, 1463954800, 1463955100, 1463955400, 1463955700, 1463956000),class = c("POSIXct", "POSIXt"), tzone = ""), WL = c(0.2, 2.5, 2.4, 2.1, 0.9, 2.8, 2.9, 1.9, 2.4, NA, 2.3, 2.6, 2.8, 2.1, 2.0, 1.9), Q = c(0.0, 255.5, 232.4, 150.1, 0.0, 345.8, 382.9, 0.0, 214.4, NA, 201.3, 312.6, 362.8, 80.1, 20.0, 0.0)), row.names = c(NA, -16L), class = "data.frame")
vth <- 2

最新更新