R:按组AND从数据子集中返回下一次出现的值(条件超前/滞后)



我的问题是这个问题的延续/修改:基于多列返回下一次出现的值

我正在使用以下数据:

df<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C", 
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600, 
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600, 
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

我想返回下一个datetime——某个firm出现在数据中,并且对于所有等于1的employeeemployee等于0。

df_expected<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C", 
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600, 
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600, 
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L), NextTime = structure(c(1514799000, 
NA, 1514797200, NA, 1514804400, 1514804400, 
NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(NA, -10L), class = "data.frame")

我用dplyr尝试过,它只有在每个firm:不超过一个employee=="0"的情况下才有效

df %>%
group_by(firm) %>%
mutate(nextTime=datetime[employee==0])

或者如果每个firm:不多于一个employee=="1"

df %>%
group_by(firm) %>%
mutate(nextTime=lead(datetime))

我尝试了多次";组合";上面的代码片段和data.table对原始问题的回答,都是徒劳的。我真的很感谢你的帮助!

试试这个:

df_expected %>%
group_by(firm) %>%
mutate(NextTime2 = if_else(lead(employee == 0), lead(datetime), datetime[NA])) %>%
tidyr::fill(NextTime2, .direction = "up") %>%
mutate(NextTime2 = if_else(employee == 0, NextTime2[NA], NextTime2)) %>%
ungroup()
# # A tibble: 10 x 5
#    firm  datetime            employee NextTime            NextTime2          
#    <chr> <dttm>                 <int> <dttm>              <dttm>             
#  1 A     2018-01-01 08:00:00        1 2018-01-01 09:30:00 2018-01-01 09:30:00
#  2 A     2018-01-01 09:30:00        0 NA                  NA                 
#  3 B     2018-01-01 08:00:00        1 2018-01-01 09:00:00 2018-01-01 09:00:00
#  4 B     2018-01-01 09:00:00        0 NA                  NA                 
#  5 B     2018-01-01 10:00:00        1 2018-01-01 11:00:00 2018-01-01 11:00:00
#  6 B     2018-01-01 10:55:00        1 2018-01-01 11:00:00 2018-01-01 11:00:00
#  7 B     2018-01-01 11:00:00        0 NA                  NA                 
#  8 C     2018-01-01 10:00:00        1 NA                  NA                 
#  9 C     2018-01-01 10:30:00        1 NA                  NA                 
# 10 C     2018-01-01 10:35:00        1 NA                  NA                 datetime

仅供参考:[NA]索引是确保true=false=向量的class相同的一个技巧。如果我只使用NA,它就会失败,因为NAlogical:类

if_else(TRUE, 1, NA)
# Error in `if_else()`:
# ! `false` must be a double vector, not a logical vector.

通过使用[NA]进行索引,我们保证它将是适当的类(有超过6种不同类型的NA(:

(1:3)[NA]
# [1] NA NA NA
class( (1:3)[NA] )
# [1] "integer"
#### many types of `NA`
class(NA)
# [1] "logical"
class( (seq(1,3,by=0.5))[NA] )
# [1] "numeric"
class( letters[NA] )
# [1] "character"
class( Sys.time()[NA] )
# [1] "POSIXct" "POSIXt" 
class( Sys.Date()[NA] )
# [1] "Date"

最新更新