R:根据影响其他列的两个条件在新列中插入值



我有一个这样的数据集

at_ID   journey_id  flight  is_origin  is_destination  is_outbound
1       1           1       NA         NA              TRUE
2       1           2       NA         NA              TRUE
3       1           3       NA         NA              FALSE
4       1           4       NA         NA              FALSE
5       2           1       NA         NA              FALSE
6       3           1       NA         NA              TRUE
7       3           2       NA         NA              FALSE

列is_origin和is_destination必须填写TRUE/FALSE,条件如下:

#first condition
is_origin = TRUE if min(flight) AND is_outbound = TRUE 
is_destination = TRUE if max(flight) AND is_outbound =TRUE
#second condition
is_origin = TRUE if min(flight) AND if is_outbound = FALSE
is_destination = TRUE if max(flight) AND if is_outbound = FALSE

输出应该像这样:

at_ID   journey_id  flight    is_origin  is_destination  is_outbound
1       1           1     TRUE       FALSE           TRUE
2       1           2     FALSE      TRUE            TRUE
3       1           3     TRUE       FALSE           FALSE
4       1           4     FALSE      TRUE            FALSE
5       2           1     TRUE       TRUE            FALSE
6       3           1     TRUE       FALSE           TRUE
7       3           2     FALSE      TRUE            FALSE

是否有有效的方法来做到这一点?

注意第5行的is_origin应为FALSE

df %>% 
mutate(is_origin = flight == min(flight) & is_outbound == TRUE,
is_destination = flight == max(flight) & is_outbound == FALSE)
at_ID journey_id flight is_origin is_destination is_outbound
1     1          1      1      TRUE          FALSE        TRUE
2     2          1      2     FALSE          FALSE        TRUE
3     3          1      3     FALSE          FALSE       FALSE
4     4          1      4     FALSE           TRUE       FALSE
5     5          2      1     FALSE          FALSE       FALSE
6     6          3      1      TRUE          FALSE        TRUE
7     7          3      2     FALSE          FALSE       FALSE

如果我们按journey_id分组,这似乎是必要的,第2行(is_destination)和第3行(is_origin)不能为TRUE,因为它们既不是最小航班也不是最大航班。

library(dplyr)
df %>% 
group_by(journey_id) %>% 
mutate(is_origin = flight == min(flight) & (is_outbound == T | is_outbound == F), 
is_destination = flight == max(flight) & (is_outbound == T | is_outbound == F)) %>% 
ungroup()
# A tibble: 7 × 6
at_ID journey_id flight is_origin is_destination is_outbound
<int>      <int>  <int> <lgl>     <lgl>          <lgl>
1     1          1      1 TRUE      FALSE          TRUE
2     2          1      2 FALSE     FALSE          TRUE
3     3          1      3 FALSE     FALSE          FALSE
4     4          1      4 FALSE     TRUE           FALSE
5     5          2      1 TRUE      TRUE           FALSE
6     6          3      1 TRUE      FALSE          TRUE
7     7          3      2 FALSE     TRUE           FALSE

最新更新