我的问题是这个问题的延续/修改:基于多列返回下一次出现的值
我正在使用以下数据:
df<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
我想返回下一个datetime
——某个firm
出现在数据中,并且对于所有等于1的employee
,employee
等于0。
df_expected<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L), NextTime = structure(c(1514799000,
NA, 1514797200, NA, 1514804400, 1514804400,
NA, NA, NA, NA), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -10L), class = "data.frame")
我用dplyr
尝试过,它只有在每个firm
:不超过一个employee=="0"
的情况下才有效
df %>%
group_by(firm) %>%
mutate(nextTime=datetime[employee==0])
或者如果每个firm
:不多于一个employee=="1"
df %>%
group_by(firm) %>%
mutate(nextTime=lead(datetime))
我尝试了多次";组合";上面的代码片段和data.table
对原始问题的回答,都是徒劳的。我真的很感谢你的帮助!
试试这个:
df_expected %>%
group_by(firm) %>%
mutate(NextTime2 = if_else(lead(employee == 0), lead(datetime), datetime[NA])) %>%
tidyr::fill(NextTime2, .direction = "up") %>%
mutate(NextTime2 = if_else(employee == 0, NextTime2[NA], NextTime2)) %>%
ungroup()
# # A tibble: 10 x 5
# firm datetime employee NextTime NextTime2
# <chr> <dttm> <int> <dttm> <dttm>
# 1 A 2018-01-01 08:00:00 1 2018-01-01 09:30:00 2018-01-01 09:30:00
# 2 A 2018-01-01 09:30:00 0 NA NA
# 3 B 2018-01-01 08:00:00 1 2018-01-01 09:00:00 2018-01-01 09:00:00
# 4 B 2018-01-01 09:00:00 0 NA NA
# 5 B 2018-01-01 10:00:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 6 B 2018-01-01 10:55:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 7 B 2018-01-01 11:00:00 0 NA NA
# 8 C 2018-01-01 10:00:00 1 NA NA
# 9 C 2018-01-01 10:30:00 1 NA NA
# 10 C 2018-01-01 10:35:00 1 NA NA datetime
仅供参考:[NA]
索引是确保true=
和false=
向量的class
相同的一个技巧。如果我只使用NA
,它就会失败,因为NA
是logical
:类
if_else(TRUE, 1, NA)
# Error in `if_else()`:
# ! `false` must be a double vector, not a logical vector.
通过使用[NA]
进行索引,我们保证它将是适当的类(有超过6种不同类型的NA
(:
(1:3)[NA]
# [1] NA NA NA
class( (1:3)[NA] )
# [1] "integer"
#### many types of `NA`
class(NA)
# [1] "logical"
class( (seq(1,3,by=0.5))[NA] )
# [1] "numeric"
class( letters[NA] )
# [1] "character"
class( Sys.time()[NA] )
# [1] "POSIXct" "POSIXt"
class( Sys.Date()[NA] )
# [1] "Date"