我已经看到了仅基于日期的这个问题的各种解决方案,但时间部分让我感到困惑。我有两个名为"datetime"的POSIXct列数据帧。对于DF1,该列的数据四舍五入到最接近的小时。对于DF2,时间分量不会四舍五入到最接近的小时,并且可以随时发生。数据帧如下所示:
DF1
日期时间 | X | YZ|||
---|---|---|---|---|
2020-09-01 03:00:00 | 1 | 34 | ||
2020-09-02 12:00:00 | 12 | 35 | ||
2020年9月02日22:00:00 | 4 | 9 | 19 | |
2020-09-03 01:00:00 | 4 | 10 | 2 | |
2020-09-04 06:00:00 | 4 | 12 | 1 | |
2020-09-04 08:00:00 | 11 | 1310 |
为merge
和group_by
添加辅助列,使用merge
和dplyr
进行过滤
library(dplyr)
df1$tmp <- as.Date(df1$datetime)
df2$tmp <- as.Date(df2$datetime)
df1$grp <- 1:(nrow(df1))
merge(df1, df2, "tmp") %>%
group_by(grp) %>%
slice(which.min(abs(difftime(datetime.x, datetime.y)))) %>%
ungroup() %>%
select(-c(tmp,grp,datetime.y))
# A tibble: 6 × 5
datetime.x X Y Z Var
<chr> <int> <int> <int> <chr>
1 2020-09-01 03:00:00 1 3 4 B
2 2020-09-02 12:00:00 12 3 5 A
3 2020-09-02 22:00:00 4 9 19 B
4 2020-09-03 01:00:00 4 10 2 B
5 2020-09-04 06:00:00 4 12 1 B
6 2020-09-04 08:00:00 11 13 10 B
数据
df1 <- structure(list(datetime = c("2020-09-01 03:00:00", "2020-09-02 12:00:00",
"2020-09-02 22:00:00", "2020-09-03 01:00:00", "2020-09-04 06:00:00",
"2020-09-04 08:00:00"), X = c(1L, 12L, 4L, 4L, 4L, 11L), Y = c(3L,
3L, 9L, 10L, 12L, 13L), Z = c(4L, 5L, 19L, 2L, 1L, 10L)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(datetime = c("2020-09-01 02:23:14", "2020-09-01 03:12:09",
"2020-09-02 11:52:15", "2020-09-02 12:15:44", "2020-09-02 22:31:56",
"2020-09-02 21:38:05", "2020-09-03 01:11:39", "2020-09-03 00:59:33",
"2020-09-04 05:12:19", "2020-09-04 06:07:09", "2020-09-04 08:22:28",
"2020-09-04 07:50:17"), Var = c("A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B")), class = "data.frame", row.names = c(NA,
-12L))