r语言 - 使用dplyr进行可变时间间隔滤波



我想根据可变时间间隔过滤时间序列。更具体地说,考虑时间戳t中的时间t_i。我想过滤我的时间序列,使剩下的时间序列只包含从t_i- 15分钟到并包括t_i+ 15分钟的时间戳。

这是我尝试的:

library(lubridate)
library(dplyr)
mv <- 2 # moving window
t <- as.POSIXct("2020-06-20 12:00", tz="UTC") # time stamp
time <- seq(ymd_hm('2020-01-01 00:00'),ymd_hm('2020-12-31 23:45'), by = '15 mins')
ts <- tibble(time=time, data=sin(seq(1,length(time),1)))
# What I did:
ts %>%
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC")) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") <= strftime(t+15*60, format = "%H:%M", tz = "UTC"))
Output:
# A tibble: 7 x 2
time                   data
<dttm>                <dbl>
1 2020-06-18 12:00:00 -0.435 
2 2020-06-18 12:15:00  0.523 
3 2020-06-19 11:45:00  0.298 
4 2020-06-19 12:00:00  0.964 
5 2020-06-19 12:15:00  0.744 
6 2020-06-20 11:45:00  0.885 
7 2020-06-20 12:00:00  0.0870

这正是我想要的,但当t <- as.POSIXct("2020-06-20 23:45", tz="UTC")(也与00:00):

# A tibble: 0 x 2
# … with 2 variables: time <dttm>, data <dbl>

我包含了一个if-else语句来规避这个问题,但它远远不够优雅,也没有给我想要的东西:

t <- as.POSIXct("2020-06-20 23:45", tz="UTC") # time stamp
if(strftime(t, format = "%H:%M", tz = "UTC") %in% c("23:45","00:00")){
ts %>% 
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC"))
} else {
ts %>% 
filter(time >= t - mv*24*60*60) %>%
filter(time <= t) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") >= strftime(t-15*60, format = "%H:%M", tz = "UTC")) %>%
filter(strftime(time, format = "%H:%M", tz = "UTC") <= strftime(t+15*60, format = "%H:%M", tz = "UTC"))
}
Output:
# A tibble: 5 x 2
time                  data
<dttm>               <dbl>
1 2020-06-18 23:45:00  0.543
2 2020-06-19 23:30:00 -0.177
3 2020-06-19 23:45:00 -0.924
4 2020-06-20 23:30:00 -0.936
5 2020-06-20 23:45:00 -0.209
Desired output:
# A tibble: 7 x 2
time                  data
<dttm>               <dbl>
1 2020-06-18 23:45:00  0.543
2 2020-06-19 00:00:00 -0.413
3 2020-06-19 23:30:00 -0.177
4 2020-06-19 23:45:00 -0.924
5 2020-06-20 00:00:00 -0.821
6 2020-06-20 23:30:00 -0.936
7 2020-06-20 23:45:00 -0.209

似乎有一个问题与天之间的转换,但我不知道如何解决它,我还没有能够找到类似的问题。有没有一种方法可以(优雅地)做到这一点?

似乎strftime(ts$time[1], format = "%H:%M", tz = "UTC") > strftime(t, format = "%H:%M", tz = "UTC")被评估为FALSE,这取决于你如何看待它。

为了缓解这种情况,你需要完整的YYYY-MM-DD HH:MM,以便它被"正确"评估。如果您计算整个字符串,而不是仅计算hours,则会出现这种情况。

我们可以通过添加包含所有HH:MMdummy变量time_来获得intervals,然后将它们视为strings

# Troublesome Vector;
t <- ymd_hm("2020-06-20 23:45", tz="UTC")


ts %>% filter(
between(
time, 
left = t - mv*24*60*60 -15*60,
right = t
)
) %>% mutate(
time_ = strftime(time, format = "%H:%M", tz = "UTC") %>% as.character()
) %>% filter(
str_detect(
time_,
pattern = seq(
t-15*60,
t+15*60,
by = "15 mins"
) %>% strftime(format = "%H:%M", tz = "UTC") %>% paste(
collapse = "|"
)
)
)

给出output

# A tibble: 8 x 3
time                  data time_
<dttm>               <dbl> <chr>
1 2020-06-18 23:30:00  1.00  23:30
2 2020-06-18 23:45:00  0.543 23:45
3 2020-06-19 00:00:00 -0.413 00:00
4 2020-06-19 23:30:00 -0.177 23:30
5 2020-06-19 23:45:00 -0.924 23:45
6 2020-06-20 00:00:00 -0.821 00:00
7 2020-06-20 23:30:00 -0.936 23:30
8 2020-06-20 23:45:00 -0.209 23:45
ts %>%
filter(between(time, t - days(mv), t)) %>%
mutate(aux = as.numeric(time) %% (60 * 60 * 24)) %>%
filter(between(aux,
(as.numeric(t) %% (60 * 60 * 24) - 900),
(as.numeric(t) %% (60 * 60 * 24) + 900)) |
aux == 0) %>%
select(-aux)

# # A tibble: 7 x 2
#   time                  data
#   <dttm>               <dbl>
# 1 2020-06-18 23:45:00  0.543
# 2 2020-06-19 00:00:00 -0.413
# 3 2020-06-19 23:30:00 -0.177
# 4 2020-06-19 23:45:00 -0.924
# 5 2020-06-20 00:00:00 -0.821
# 6 2020-06-20 23:30:00 -0.936
# 7 2020-06-20 23:45:00 -0.209

对于这个特定的任务来说,这可能是非常特殊的,有点难以阅读。间隔反映了一个持续时间(固定的秒数)。对于类似的情况,如果日期增加,则需要更改偏移量并将值调整86400。如果t是午夜,或者如果偏移量不等于15',则此版本不起作用。

如果你只有2天,这也是一种方法(使用句号而不是持续时间):

ts %>%
filter(between(time, t - days(mv), t)) %>%
filter(between(time, t - minutes(15), t + minutes(15)) |
between(time, t - days(1) - minutes(15), t - days(1) + minutes(15)) | 
between(time, t - days(2) - minutes(15), t - days(2) + minutes(15)))

在本例中给出相同的结果。如果你想调整边距,你需要改变值。

顺便说一下:你不应该在R中使用t作为对象的名称,因为它已经是一个函数的名称了。

HTH

最新更新