r语言 - 如何每 15 分钟计算一次"batch"


1 2021-01-01 12:59:38
2 2021-01-01 14:08:59
3 2021-01-01 14:09:08
4 2021-01-01 14:11:30
5 2021-01-01 14:22:19
6 2021-01-01 14:41:07

我希望能够每15分钟统计一次参赛作品的数量,但要滚动。E.g 12:59将在15分钟内为1,14:08=>14:22将全部在15分钟内,因此这将在该批次中返回4,最后14:41将在另一个15分钟批次中单独返回。

我希望这是合理的,并提前感谢。

抱歉没有包含此

> dput(df)
structure(list(ClickedDate = structure(c(1609460198.707, 1609462979.593, 
1609465088.437, 1609476270.88, 1609478479.177, 1609479667.373, 
1609493081.887, 1609499187.29, 1609507506.37, 1609510989.533, 
1609511522.023, 1609511894.067, 1609512194.773, 1609512377.227, 
1609514474.153), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), batch_no = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 12L, 12L, 13L), batch_size = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

新版-感谢您为此所做的工作。我收到一个错误

Error in UseMethod("mutate") : 
no applicable method for 'mutate' applied to an object of class "c('integer', 'numeric')"

这似乎很奇怪,我的变量在类中

> class(df$ClickedDate)
[1] "POSIXct" "POSIXt" 

这和mutate一起工作吗,或者我需要转换它吗?

> dput(df)
structure(list(ClickedDate = structure(c(1609460198.707, 1609462979.593, 
1609465088.437, 1609476270.88, 1609478479.177, 1609479667.373, 
1609493081.887, 1609499187.29, 1609507506.37, 1609510989.533, 
1609511522.023, 1609511894.067, 1609512194.773, 1609512377.227, 
1609514474.153), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), batch_no = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 12L, 12L, 13L), batch_size = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

提前感谢

在这种情况下使用runner包会有所帮助。使用以下策略

library(tidyverse)
library(runner)
df %>% mutate(b_len = runner::runner(x = ClickedDate,
idx = ClickedDate,
k = "15 mins",
lag = "-14 mins",
f = length),
b_no = purrr::accumulate(seq_len(length(b_len)-1), .init = b_len[1], ~ifelse(.x > .y, .x, .x + b_len[.x +1])),
b_no = as.integer(as.factor(b_no))) %>%
group_by(b_no) %>%
mutate(b_len = n())
# A tibble: 15 x 3
# Groups:   b_no [12]
ClickedDate         b_len  b_no
<dttm>              <int> <int>
1 2021-01-01 00:16:38     1     1
2 2021-01-01 01:02:59     1     2
3 2021-01-01 01:38:08     1     3
4 2021-01-01 04:44:30     1     4
5 2021-01-01 05:21:19     1     5
6 2021-01-01 05:41:07     1     6
7 2021-01-01 09:24:41     1     7
8 2021-01-01 11:06:27     1     8
9 2021-01-01 13:25:06     1     9
10 2021-01-01 14:23:09     2    10
11 2021-01-01 14:32:02     2    10
12 2021-01-01 14:38:14     3    11
13 2021-01-01 14:43:14     3    11
14 2021-01-01 14:46:17     3    11
15 2021-01-01 15:21:14     1    12

备注-

runner函数中的lag参数允许向后的时间窗口(滚动(,所以我使用负滞后来使用向前的时间窗口
  • runner中的k自变量适用于给定长度的滚动窗口
  • b_no列最初识别滑动/滚动窗口,直到最早的窗口用完,然后取新窗口
  • 也可以使用dense_rank(参见下面的备选方案(
  • 或者

    df %>% mutate(b_len = runner::runner(x = ClickedDate,
    idx = ClickedDate,
    k = "15 mins",
    lag = "-14 mins",
    f = length),
    b_no = purrr::accumulate(seq_len(length(b_len)-1), .init = b_len[1], ~ifelse(.x > .y, .x, .x + b_len[.x +1])),
    b_no = dense_rank(b_no)) %>%
    group_by(b_no) %>%
    mutate(b_len = n()) %>%
    ungroup()
    # A tibble: 15 x 3
    ClickedDate         b_len  b_no
    <dttm>              <int> <int>
    1 2021-01-01 00:16:38     1     1
    2 2021-01-01 01:02:59     1     2
    3 2021-01-01 01:38:08     1     3
    4 2021-01-01 04:44:30     1     4
    5 2021-01-01 05:21:19     1     5
    6 2021-01-01 05:41:07     1     6
    7 2021-01-01 09:24:41     1     7
    8 2021-01-01 11:06:27     1     8
    9 2021-01-01 13:25:06     1     9
    10 2021-01-01 14:23:09     2    10
    11 2021-01-01 14:32:02     2    10
    12 2021-01-01 14:38:14     3    11
    13 2021-01-01 14:43:14     3    11
    14 2021-01-01 14:46:17     3    11
    15 2021-01-01 15:21:14     1    12
    

    数据使用

    df
    > df
    # A tibble: 15 x 1
    ClickedDate        
    <dttm>             
    1 2021-01-01 00:16:38
    2 2021-01-01 01:02:59
    3 2021-01-01 01:38:08
    4 2021-01-01 04:44:30
    5 2021-01-01 05:21:19
    6 2021-01-01 05:41:07
    7 2021-01-01 09:24:41
    8 2021-01-01 11:06:27
    9 2021-01-01 13:25:06
    10 2021-01-01 14:23:09
    11 2021-01-01 14:32:02
    12 2021-01-01 14:38:14
    13 2021-01-01 14:43:14
    14 2021-01-01 14:46:17
    15 2021-01-01 15:21:14
    

    最新更新