我有一个这样的数据集
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 NA NA TRUE
2 1 2 NA NA TRUE
3 1 3 NA NA FALSE
4 1 4 NA NA FALSE
5 2 1 NA NA FALSE
6 3 1 NA NA TRUE
7 3 2 NA NA FALSE
列is_origin和is_destination必须填写TRUE/FALSE,条件如下:
#first condition
is_origin = TRUE if min(flight) AND is_outbound = TRUE
is_destination = TRUE if max(flight) AND is_outbound =TRUE
#second condition
is_origin = TRUE if min(flight) AND if is_outbound = FALSE
is_destination = TRUE if max(flight) AND if is_outbound = FALSE
输出应该像这样:
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 TRUE FALSE TRUE
2 1 2 FALSE TRUE TRUE
3 1 3 TRUE FALSE FALSE
4 1 4 FALSE TRUE FALSE
5 2 1 TRUE TRUE FALSE
6 3 1 TRUE FALSE TRUE
7 3 2 FALSE TRUE FALSE
是否有有效的方法来做到这一点?
注意第5行的is_origin
应为FALSE
df %>%
mutate(is_origin = flight == min(flight) & is_outbound == TRUE,
is_destination = flight == max(flight) & is_outbound == FALSE)
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 1 TRUE FALSE TRUE
2 2 1 2 FALSE FALSE TRUE
3 3 1 3 FALSE FALSE FALSE
4 4 1 4 FALSE TRUE FALSE
5 5 2 1 FALSE FALSE FALSE
6 6 3 1 TRUE FALSE TRUE
7 7 3 2 FALSE FALSE FALSE
如果我们按journey_id分组,这似乎是必要的,第2行(is_destination)和第3行(is_origin)不能为TRUE,因为它们既不是最小航班也不是最大航班。
library(dplyr)
df %>%
group_by(journey_id) %>%
mutate(is_origin = flight == min(flight) & (is_outbound == T | is_outbound == F),
is_destination = flight == max(flight) & (is_outbound == T | is_outbound == F)) %>%
ungroup()
# A tibble: 7 × 6
at_ID journey_id flight is_origin is_destination is_outbound
<int> <int> <int> <lgl> <lgl> <lgl>
1 1 1 1 TRUE FALSE TRUE
2 2 1 2 FALSE FALSE TRUE
3 3 1 3 FALSE FALSE FALSE
4 4 1 4 FALSE TRUE FALSE
5 5 2 1 TRUE TRUE FALSE
6 6 3 1 TRUE FALSE TRUE
7 7 3 2 FALSE TRUE FALSE