r-如何整理两个没有匹配时间且包含不同列的时间序列数据帧



我有两个数据帧

df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

我想通过时间戳将它们合并到一个单独的数据帧中,其中时间按时间顺序匹配,不匹配的列只需填写NA

timestamp              value     text
"2022-07-29 00:00:00"  1         NA
"2022-07-29 00:00:05"  NA        a
...

我一辈子都不明白为什么简单地合并(x,y,by=(不起作用。left_join也没有。这看起来很简单,我似乎想不通。

此外,一个额外的好处是它也能以这种方式工作,每次文本发生变化时,它下面的所有行都保持为最后一个文本,直到df2:再次提示文本发生变化

timestamp              value     text
"2022-07-29 00:00:00"  1         NA
"2022-07-29 00:00:05"  NA        a
"2022-07-29 00:05:00"  2         a
"2022-07-29 00:05:05"  NA        b
"2022-07-29 00:10:00"  3         b
"2022-07-29 00:15:00"  4         b
"2022-07-29 00:20:00"  5         b
"2022-07-29 00:20:05"  NA        c

感谢

您需要执行一个完整的联接。

tidyverse解决方案:

df1 %>% 
full_join(df2, c("timeStamp" = "timeStamp")) %>% 
arrange(timeStamp) %>% 
tidyr::fill(text)
timeStamp value text
1 2022-07-29 00:00:00     1 <NA>
2 2022-07-29 00:00:05    NA    a
3 2022-07-29 00:05:00     2    a
4 2022-07-29 00:05:05    NA    b
5 2022-07-29 00:10:00     3    b
6 2022-07-29 00:15:00     4    b
7 2022-07-29 00:20:00     5    b
8 2022-07-29 00:20:05    NA    c

基本R合并:

merge(df1, df2, by = c("timeStamp" = "timeStamp"), all = T)

使用末尾注释中对df1和df2的更正定义,使用all = TRUE执行merge,然后使用na.locf0填充NA。

library(zoo)
m <- merge(df1, df2, all = TRUE) |> transform(text = na.locf0(text)); m
##             timeStamp value text
## 1 2022-07-29 00:00:00     1 <NA>
## 2 2022-07-29 00:00:05    NA    a
## 3 2022-07-29 00:05:00     2    a
## 4 2022-07-29 00:05:05    NA    b
## 5 2022-07-29 00:10:00     3    b
## 6 2022-07-29 00:15:00     4    b
## 7 2022-07-29 00:20:00     5    b
## 8 2022-07-29 00:20:05    NA    c

如果文本的不同值实际上指的是应该放在单独列中的不同系列,那么我们可以创建以下zoo系列。(如果需要将其转换为数据帧,请使用fortify.zoo(z)。(

z <- read.zoo(m, split = "text"); z
##                      a  b  c
## 2022-07-29 00:00:05 NA NA NA
## 2022-07-29 00:05:00  2 NA NA
## 2022-07-29 00:05:05 NA NA NA
## 2022-07-29 00:10:00 NA  3 NA
## 2022-07-29 00:15:00 NA  4 NA
## 2022-07-29 00:20:00 NA  5 NA
## 2022-07-29 00:20:05 NA NA NA

备注

df1 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:00",
"2022-07-29 00:05:00", "2022-07-29 00:10:00","2022-07-29 00:15:00",
"2022-07-29 00:20:00")), value = c(1,2,3,4,5))
df2 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:05",
"2022-07-29 00:05:05", "2022-07-29 00:20:05")), 
text = c("a","b","c"))

相关内容

  • 没有找到相关文章

最新更新