删除R中时间戳之间的间隔



我想删除df中与前一行相距不在一小时或一小时的间隔。

例如:

timestamp
2021-03-01 12:00
2021-03-01 12:10
2021-03-01 12:20
2021-03-01 12:30
2021-03-01 12:40
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 12:10
2021-03-02 12:20
2021-03-02 12:30
2021-03-02 12:40
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 11:10
2021-03-03 11:20
2021-03-03 11:30
2021-03-03 11:40
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10

df的日期和间隔可以发生在任何小时之间,并且不是所有的间隔都是10分钟。

我想最终得到的是:

时间戳

2021-03-01 12:00
2021-03-01 13:00
2021-03-01 14:30
2021-03-01 15:30
2021-03-01 16:30
2021-03-02 12:00
2021-03-02 13:00
2021-03-03 11:00
2021-03-03 12:00
2021-03-03 13:10
2021-03-03 14:10
2021-03-03 15:10

TIA-

您可以从lubridate使用函数round_date()

library(lubridate)
df <- data.frame(id = 1:4,
timestamp = ymd_hm(
c(
"2021-03-01 12:10",
"2021-03-01 12:00",
"2021-03-01 13:30",
"2021-03-01 14:00"
)
))
precise <- round_date(df$timestamp, unit = "hour")
df |> dplyr::filter(timestamp %in% precise)

(附言:这些是日期时间,而不是间隔(。

另一个选项,也使用lubridate

library(dplyr)
library(lubridate)
timestamp <- c("2021-03-01 12:00", "2021-03-01 12:10", "2021-03-01 12:20", "2021-03-01 12:30",
"2021-03-01 12:40", "2021-03-01 13:00", "2021-03-01 14:30", "2021-03-01 15:30",
"2021-03-01 16:30", "2021-03-02 12:00", "2021-03-02 12:10", "2021-03-02 12:20",
"2021-03-02 12:30", "2021-03-02 12:40", "2021-03-02 13:00", "2021-03-03 11:00",
"2021-03-03 11:10", "2021-03-03 11:20", "2021-03-03 11:30", "2021-03-03 11:40",
"2021-03-03 12:00", "2021-03-03 13:10", "2021-03-03 14:10", "2021-03-03 15:10")
data <- data.frame(timestamp)
timestamp_exit <- data %>% 
mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>% 
filter(minute(timestamp) == 0)

输出

> timestamp_exit
timestamp
1 2021-03-01 12:00
2 2021-03-01 13:00
3 2021-03-02 12:00
4 2021-03-02 13:00
5 2021-03-03 11:00
6 2021-03-03 12:00

关于时间间隔:这是一个小函数,它沿着timestamp列滚动,并发现每个连续的时间戳至少间隔一个小时。然而,请注意,这里没有对分钟进行取整,但它与您想要的输出相匹配。

hourinterval <- function(interval, ind = 1) {
ind.next <- first(which(difftime(interval, interval[ind], units="hours") >= 1))
if(is.na(ind.next))
return(ind)
else
return(c(ind, hourinterval(interval, ind.next)))
}
timestamp_exit2 <- data %>% 
mutate(timestamp = format(as.POSIXct(timestamp), format = '%Y-%m-%d %H:%M')) %>%
slice(hourinterval(timestamp))

输出

> timestamp_exit2
timestamp
1  2021-03-01 12:00
2  2021-03-01 13:00
3  2021-03-01 14:30
4  2021-03-01 15:30
5  2021-03-01 16:30
6  2021-03-02 12:00
7  2021-03-02 13:00
8  2021-03-03 11:00
9  2021-03-03 12:00
10 2021-03-03 13:10
11 2021-03-03 14:10
12 2021-03-03 15:10

最新更新