我有两个数据帧
df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))
我想通过时间戳将它们合并到一个单独的数据帧中,其中时间按时间顺序匹配,不匹配的列只需填写NA
timestamp value text
"2022-07-29 00:00:00" 1 NA
"2022-07-29 00:00:05" NA a
...
我一辈子都不明白为什么简单地合并(x,y,by=(不起作用。left_join也没有。这看起来很简单,我似乎想不通。
此外,一个额外的好处是它也能以这种方式工作,每次文本发生变化时,它下面的所有行都保持为最后一个文本,直到df2:再次提示文本发生变化
timestamp value text
"2022-07-29 00:00:00" 1 NA
"2022-07-29 00:00:05" NA a
"2022-07-29 00:05:00" 2 a
"2022-07-29 00:05:05" NA b
"2022-07-29 00:10:00" 3 b
"2022-07-29 00:15:00" 4 b
"2022-07-29 00:20:00" 5 b
"2022-07-29 00:20:05" NA c
感谢
您需要执行一个完整的联接。
tidyverse解决方案:
df1 %>%
full_join(df2, c("timeStamp" = "timeStamp")) %>%
arrange(timeStamp) %>%
tidyr::fill(text)
timeStamp value text
1 2022-07-29 00:00:00 1 <NA>
2 2022-07-29 00:00:05 NA a
3 2022-07-29 00:05:00 2 a
4 2022-07-29 00:05:05 NA b
5 2022-07-29 00:10:00 3 b
6 2022-07-29 00:15:00 4 b
7 2022-07-29 00:20:00 5 b
8 2022-07-29 00:20:05 NA c
基本R合并:
merge(df1, df2, by = c("timeStamp" = "timeStamp"), all = T)
使用末尾注释中对df1和df2的更正定义,使用all = TRUE
执行merge
,然后使用na.locf0
填充NA。
library(zoo)
m <- merge(df1, df2, all = TRUE) |> transform(text = na.locf0(text)); m
## timeStamp value text
## 1 2022-07-29 00:00:00 1 <NA>
## 2 2022-07-29 00:00:05 NA a
## 3 2022-07-29 00:05:00 2 a
## 4 2022-07-29 00:05:05 NA b
## 5 2022-07-29 00:10:00 3 b
## 6 2022-07-29 00:15:00 4 b
## 7 2022-07-29 00:20:00 5 b
## 8 2022-07-29 00:20:05 NA c
如果文本的不同值实际上指的是应该放在单独列中的不同系列,那么我们可以创建以下zoo系列。(如果需要将其转换为数据帧,请使用fortify.zoo(z)
。(
z <- read.zoo(m, split = "text"); z
## a b c
## 2022-07-29 00:00:05 NA NA NA
## 2022-07-29 00:05:00 2 NA NA
## 2022-07-29 00:05:05 NA NA NA
## 2022-07-29 00:10:00 NA 3 NA
## 2022-07-29 00:15:00 NA 4 NA
## 2022-07-29 00:20:00 NA 5 NA
## 2022-07-29 00:20:05 NA NA NA
备注
df1 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:00",
"2022-07-29 00:05:00", "2022-07-29 00:10:00","2022-07-29 00:15:00",
"2022-07-29 00:20:00")), value = c(1,2,3,4,5))
df2 <- data.frame(timeStamp = as.POSIXct(c("2022-07-29 00:00:05",
"2022-07-29 00:05:05", "2022-07-29 00:20:05")),
text = c("a","b","c"))