在给定条件下如何计算时间戳之间的差值



我想确定关于userID的时间戳之间的差异。这里我只想度量具有登录和注销状态的用户之间的差异。有一些用户只是注销了我们的登录状态。对于它们,我只想将它们标记为NA:

一些数据:

  library(dplyr)
  start <- as.POSIXct("2012-01-15")
  interval <- 70
  end <- start + as.difftime(1, units="days")
  tseq<- seq(from=start, by=interval*70, to=end)
  employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
  status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
  # put together
  data <- data.frame(tseq, employeID, status)
           tseq            employeID   status
  #1  2012-01-15 00:00:00       1_e  login
  #2  2012-01-15 01:21:40       1_e logout
  #3  2012-01-15 02:43:20       2_b  login
  #4  2012-01-15 04:05:00       2_b logout
  #5  2012-01-15 05:26:40       3_c  login
  #6  2012-01-15 06:48:20       3_c logout
  #7  2012-01-15 08:10:00     100_c  login
  #8  2012-01-15 09:31:40       4_d logout
  #9  2012-01-15 10:53:20       4_d  login
  #10 2012-01-15 12:15:00      52_f logout
  #11 2012-01-15 13:36:40       9_f  login
  #12 2012-01-15 14:58:20       9_f logout
  #13 2012-01-15 16:20:00       7_u  login
  #14 2012-01-15 17:41:40       7_u logout
  #15 2012-01-15 19:03:20      10_5 logout
  #16 2012-01-15 20:25:00      22_2  login
  #17 2012-01-15 21:46:40      33_a logout
  #18 2012-01-15 23:08:20      33_a  login  

  test<- data %>% 
    group_by(employeID) %>% 
    mutate(time.difference = tseq - lag(tseq))

然而,这似乎只产生了一个时间。不同常数

这个怎么样?主要是,当你想用summarise时,看起来你在用mutate。此外,我还将status列从因子转换为字符,并包含一个ifelse语句,以便只接收具有"登录"one_answers"注销"条目的用户:

test <- data %>% 
    mutate( status = as.character( status ) ) %>%
    group_by( employeID ) %>% 
    summarise( time.difference = ifelse( "login" %in% status && "logout" %in% status, 
                                         difftime( tseq[ status == "logout" ], tseq[ status == "login" ] ), 
                                         NA ) 
    )
给了

:

> head( test )
# A tibble: 6 × 2
employeID time.difference
      <fctr>           <dbl>
1       1_e        1.361111
2      10_5              NA
3     100_c              NA
4       2_b        1.361111
5      22_2              NA
6       3_c        1.361111

正如其他人所建议的那样,您的数据确实包含恒定的时间间隔,因此只要存在相关值,它总是相同的。我假设您的实际数据看起来有点不同,因此您将得到更合理的输出。

我们首先通过检查每个组的计数来筛选具有未配对状态的组。与dplyr::do我们然后计算各组时差

 library(dplyr)
  start <- as.POSIXct("2012-01-15")
  interval <- 70
  end <- start + as.difftime(1, units="days")
  tseq<- seq(from=start, by=interval*70, to=end)
  employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
  status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
  # put together
  DF <- data.frame(tseq, employeID, status)
           tseq            employeID   status
  #1  2012-01-15 00:00:00       1_e  login
  #2  2012-01-15 01:21:40       1_e logout
  #3  2012-01-15 02:43:20       2_b  login
  #4  2012-01-15 04:05:00       2_b logout
  #5  2012-01-15 05:26:40       3_c  login
  #6  2012-01-15 06:48:20       3_c logout
  #7  2012-01-15 08:10:00     100_c  login
  #8  2012-01-15 09:31:40       4_d logout
  #9  2012-01-15 10:53:20       4_d  login
  #10 2012-01-15 12:15:00      52_f logout
  #11 2012-01-15 13:36:40       9_f  login
  #12 2012-01-15 14:58:20       9_f logout
  #13 2012-01-15 16:20:00       7_u  login
  #14 2012-01-15 17:41:40       7_u logout
  #15 2012-01-15 19:03:20      10_5 logout
  #16 2012-01-15 20:25:00      22_2  login
  #17 2012-01-15 21:46:40      33_a logout
  #18 2012-01-15 23:08:20      33_a  login  

  testDF<- DF %>% 
    dplyr::group_by(employeID) %>%
    dplyr::filter(count(unique(status)) > 1 ) %>% 
    dplyr::do(.,data.frame(logINTime =.$tseq[.$status=="login"],logOUTTime =.$tseq[.$status=="logout"],
    deltaTime=difftime(.$tseq[.$status=="logout"],.$tseq[.$status=="login"],units="secs"))) %>%
    as.data.frame()

testDF
  # employeID           logINTime          logOUTTime deltaTime
# 1       1_e 2012-01-15 00:00:00 2012-01-15 01:21:40      4900
# 2       2_b 2012-01-15 02:43:20 2012-01-15 04:05:00      4900
# 3       3_c 2012-01-15 05:26:40 2012-01-15 06:48:20      4900
# 4      33_a 2012-01-15 23:08:20 2012-01-15 21:46:40     -4900
# 5       4_d 2012-01-15 10:53:20 2012-01-15 09:31:40     -4900
# 6       7_u 2012-01-15 16:20:00 2012-01-15 17:41:40      4900
# 7       9_f 2012-01-15 13:36:40 2012-01-15 14:58:20      4900

这行似乎创建了一个恒定的时间间隔:

tseq<- seq(from=start, by=interval*70, to=end)

所以当你再次取差值时,它不是常数吗?

最新更新