r语言 - 如何根据多个条件滞后日期,并在发生滞后重置滞后



我有一个车站维修的数据帧。

工作流程是这样的:机械师去一个车站,他们按下一个按钮,记录一个名为releaseaction。修复电台后,他们再次按下按钮,动作现在return

您可以在下面看到row 1row 2是一个已完成的任务,花了Jane Jetson10 秒完成。

dt        name foo_id foo_role bikeId station_name station_id  action
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic  12345   FooStation    1234.89 Release
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic  12345   FooStation    1234.89  Return
3 2019-12-12 13:06:16    John Doe 106338 Mechanic  12345   FooStation    1234.89 Release
4 2019-12-12 13:06:19    John Doe 106338 Mechanic  12345   FooStation    1234.89  Return
5 2019-12-12 13:07:16    John Doe 106338 Mechanic  12345   FooStation    1234.89 Release
6 2019-12-12 14:07:16    John Doe 106338 Mechanic  56789 Some Station    4567.12 Release

我想发生什么:

  • 我想知道每个mechanic使用actionRelease修复车站,然后遵循Return需要多长时间。
  • 如果一个Release没有Return,我想取Sys.time()并从dt中减去它。你会看到row 5row 6

我这样做了:(我不是 100% 确定我需要上一个操作,但我包括在内以防万一。

library(dplyr)
library(tidyr)
foo = arrange(foo, foo_id, name, foo_role, bikeId, station_id) %>% 
group_by(foo_id,name, foo_role, bikeId, station_name,station_id) %>%
mutate(prev_dt = lag(dt, order_by = foo_id), 
prev_action = lag(action, order_by=foo_id, default = 'NaN'))
foo$timediffsecs = as.numeric(difftime(foo$dt,foo$prev_dt,units='secs'))
> foo
# A tibble: 6 x 11
# Groups:   foo_id, name, foo_role, bikeId, station_name, station_id [3]
dt                  name        foo_id foo_role bikeId station_name station_id action  prev_dt             prev_action timediffsecs
<dttm>              <fct>        <int> <fct>     <int> <fct>             <dbl> <chr>   <dttm>              <chr>              <dbl>
1 2019-12-12 13:05:47 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Release NA                  NaN                   NA
2 2019-12-12 13:05:57 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Return  2019-12-12 13:05:47 Release               10
3 2019-12-12 13:06:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release NA                  NaN                   NA
4 2019-12-12 13:06:19 John Doe    106338 Mechanic  12345 FooStation        1235. Return  2019-12-12 13:06:16 Release                3
5 2019-12-12 13:07:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release 2019-12-12 13:06:19 Return                57
6 2019-12-12 14:07:16 John Doe    106338 Mechanic  56789 Some Station      4567. Release NA                  NaN                   NA

问题:

  1. row 5是一个新的周期,因为actionReleaseReturn以前发生过,但timediffsecs记录了57秒。 在row 5Prev_dtprev_action应该NAtimediffsecs=Sys.time() - dt.

  2. row 6应该有timediffsecs=Sys.time() - dt

我的想法可以工作:

我将prev_actionNA 更改为 NaN,所以我可以做一些 if else 语句,但我不太确定如何为此构建一个语句。我想将prev_dt中的 NA 更改为默认为dt但这样做时出现问题。我想尝试这样做的原因是我可以使用条件语句,但如果不需要,则无需更改 NA。

tl;DR:我希望timediffsecs记录正确的秒数。row 5row 6有问题。row 5应该是Sys.time() - dt.row 6我想返回Sys.time() - dt

数据:

structure(list(dt = structure(c(1576173947, 1576173957, 1576173976, 
1576173979, 1576174036, 1576177636), class = c("POSIXct", "POSIXt"
), tzone = ""), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jane Jetson", 
"John Doe"), class = "factor"), foo_id = c(106337L, 106337L, 
106338L, 106338L, 106338L, 106338L), foo_role = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = "Mechanic", class = "factor"), 
bikeId = c(12345L, 12345L, 12345L, 12345L, 12345L, 56789L
), station_name = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("FooStation", 
"Some Station"), class = "factor"), station_id = c(1234.89, 
1234.89, 1234.89, 1234.89, 1234.89, 4567.12), action = c("Release", 
"Return", "Release", "Return", "Release", "Release")), row.names = c(NA, 
-6L), class = "data.frame")

使用dplyr的一种方法可能是为每个name(或foo_id(以及每次发生"Release"时创建组。在这个组中,我们计算'Return''Release'时间之间的差异,如果存在'Return''Release'时间和当前时间之间的差异。

library(dplyr)
df %>%
group_by(name, group = cumsum(action == "Release")) %>%
mutate(timediffsecs = if (any(action == 'Return')) 
dt[action == 'Return'] - dt[action == 'Release'] else Sys.time() - dt, 
#If we want to replace Release values with NA
timediffsecs  = replace(timediffsecs, n() > 1 & action == 'Release', NA))

#   dt                  name        foo_id foo_role bikeId station_name station_id action  group timediffsecs
#  <dttm>              <fct>        <int> <fct>     <int> <fct>             <dbl> <chr>   <int> <drtn>      
#1 2019-12-13 02:05:47 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Release     1     NA secs 
#2 2019-12-13 02:05:57 Jane Jetson 106337 Mechanic  12345 FooStation        1235. Return      1     10 secs 
#3 2019-12-13 02:06:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release     2     NA secs 
#4 2019-12-13 02:06:19 John Doe    106338 Mechanic  12345 FooStation        1235. Return      2      3 secs 
#5 2019-12-13 02:07:16 John Doe    106338 Mechanic  12345 FooStation        1235. Release     3 472603 secs 
#6 2019-12-13 03:07:16 John Doe    106338 Mechanic  56789 Some Station      4567. Release     4 469003 secs 

虽然这适用于给定的示例,但可能需要根据数据进行一些调整。

最新更新