r语言 - 如何将时间序列拆分为单独的事件并分配事件 ID



我想将不规则的时间序列拆分为单独的事件,并为每个站点为每个事件分配一个唯一的数字ID。

下面是一个示例数据框:

structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AllenBrook", "Eastberk"), class = 
"factor"), 
    timestamp = structure(c(10L, 13L, 8L, 4L, 5L, 6L, 7L, 9L, 
    11L, 12L, 1L, 2L, 3L), .Label = c("10/1/12 11:29", "10/1/12 14:29", 
    "10/1/12 17:29", "10/20/12 16:30", "10/20/12 19:30", "10/21/12 1:30", 
    "10/21/12 4:30", "9/5/12 12:30", "9/5/12 4:14", "9/5/12 6:30", 
    "9/5/12 7:14", "9/5/12 7:44", "9/5/12 9:30"), class = "factor")), class 
= "data.frame", row.names = c(NA, 
-13L))

每个事件的时间戳长度或数量不同,因此如果时间戳与该站点的下一个时间戳之间经过 12 小时以上,我想将它们拆分为单独的事件。现场的每个事件都应收到一个唯一的数字 ID。这是我想要的结果:

         site      timestamp eventid
1  AllenBrook    9/5/12 6:30       1
2  AllenBrook    9/5/12 9:30       1
3  AllenBrook   9/5/12 12:30       1
4  AllenBrook 10/20/12 16:30       2
5  AllenBrook 10/20/12 19:30       2
6  AllenBrook  10/21/12 1:30       2
7  AllenBrook  10/21/12 4:30       2
8    Eastberk    9/5/12 4:14       1
9    Eastberk    9/5/12 7:14       1
10   Eastberk    9/5/12 7:44       1
11   Eastberk  10/1/12 11:29       2
12   Eastberk  10/1/12 14:29       2
13   Eastberk  10/1/12 17:29       2

任何编码解决方案都可以,但对于tidyversedata.table解决方案来说,这是加分项。感谢您提供的任何帮助!

使用 data.table ,您也许可以执行以下操作:

library(data.table)
setDT(tmp)[, timestamp := as.POSIXct(timestamp, format="%m/%d/%y %H:%M")][, 
    eventid := 1L+cumsum(c(0L, diff(timestamp)>720)), by=.(site)]

diff(timestamp)计算相邻行之间的时差。然后我们检查差异是否大于 12h(或 720 分钟)。R 中的一个常见技巧是使用 cumsum 来标识事件何时在序列中发生,并将后续元素与此事件分组在一起,直到下一个事件再次发生。由于cumsum少返回 1 个元素,我们使用 0L 来填充开头。 1+只是从 1 而不是 0 开始索引。

输出:

          site           timestamp eventid
 1: AllenBrook 2012-09-05 06:30:00       1
 2: AllenBrook 2012-09-05 09:30:00       1
 3: AllenBrook 2012-09-05 12:30:00       1
 4: AllenBrook 2012-10-20 16:30:00       2
 5: AllenBrook 2012-10-20 19:30:00       2
 6: AllenBrook 2012-10-21 01:30:00       2
 7: AllenBrook 2012-10-21 04:30:00       2
 8:   Eastberk 2012-09-05 04:14:00       1
 9:   Eastberk 2012-09-05 07:14:00       1
10:   Eastberk 2012-09-05 07:44:00       1
11:   Eastberk 2012-10-01 11:29:00       2
12:   Eastberk 2012-10-01 14:29:00       2
13:   Eastberk 2012-10-01 17:29:00       2

数据:

tmp <- structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AllenBrook", "Eastberk"), class = 
     "factor"), 
 timestamp = structure(c(10L, 13L, 8L, 4L, 5L, 6L, 7L, 9L, 
     11L, 12L, 1L, 2L, 3L), .Label = c("10/1/12 11:29", "10/1/12 14:29", 
         "10/1/12 17:29", "10/20/12 16:30", "10/20/12 19:30", "10/21/12 1:30", 
         "10/21/12 4:30", "9/5/12 12:30", "9/5/12 4:14", "9/5/12 6:30", 
         "9/5/12 7:14", "9/5/12 7:44", "9/5/12 9:30"), class = "factor")), class 
 = "data.frame", row.names = c(NA, 
     -13L))

相关内容

最新更新