r语言 - 在 data.table 中绑定行时间序列的有效方法，具有正确排序的时间戳 - r - Efficient way of row binding time series in a data.table, with correctly sorted timestamps 小贝子编程网

有没有更有效的方法将两个或多个海量时间序列与数据表进行行绑定(或有效合并(？时间序列有一些不同的列，所以我使用fill = TRUE.

我希望每个时间序列中的所有行都出现在最终的 data.table 中。我可以在下面做，但时间序列邮票不在下面的dt3中排序。我必须创建dt4才能获得订购的邮票。

我想知道是否有更有效的方法在 data.table 中进行一种 rbind/时间序列合并？

library(data.table)
tm <- seq(as.POSIXct("2018-05-12 00:00"), as.POSIXct("2018-05-14"), by = "hours")
dt <- data.table(time = tm, x = seq(1, length(tm), by = 1))
set.seed(1)
dt2 <- data.table(time = tm[sample(length(tm), size = 8)] + rnorm(n = 8, 0, 60),
y = rnorm(8))
# Can a one liner here get me the output in `dt4` with some kind of row bind? 
#  Is there a way to do a row bind here instead that avoids the creation of a new object dt4 that takes the sorted rows?
dt3 <- rbind(dt, dt2, fill = TRUE)
dt4 <- dt3[order(time)]
tail(dt4, 20)
#                   time  x           y
# 1: 2018-05-13 08:00:00 33          NA
# 2: 2018-05-13 09:00:00 34          NA
# 3: 2018-05-13 10:00:00 35          NA
# 4: 2018-05-13 11:00:00 36          NA
# 5: 2018-05-13 12:00:00 37          NA
# 6: 2018-05-13 13:00:00 38          NA
# 7: 2018-05-13 14:00:00 39          NA
# 8: 2018-05-13 14:59:41 NA  0.94383621
# 9: 2018-05-13 15:00:00 40          NA
# 10: 2018-05-13 16:00:00 41          NA
# 11: 2018-05-13 16:01:30 NA  0.82122120
# 12: 2018-05-13 17:00:00 42          NA
# 13: 2018-05-13 17:00:44 NA -0.04493361
# 14: 2018-05-13 18:00:00 43          NA
# 15: 2018-05-13 19:00:00 44          NA
# 16: 2018-05-13 20:00:00 45          NA
# 17: 2018-05-13 21:00:00 46          NA
# 18: 2018-05-13 22:00:00 47          NA
# 19: 2018-05-13 23:00:00 48          NA
# 20: 2018-05-14 00:00:00 49          NA

如果将时间列设置为键

setkey(dt, time)
setkey(dt2, time)

然后你可以使用merge.data.table

merge(dt,dt2,all=TRUE)

请注意，如果已知时间序列已排序(dt 是，但 dt2 不是(，则可以通过设置 data.tables 的"sorted"属性来加快速度，而不是调用setkey。

attr(dt, 'sorted') = 'time'

r语言 - 在 data.table 中绑定行时间序列的有效方法，具有正确排序的时间戳

相关内容

最新更新

热门标签：